MCP Server Integration¶
Rhubarb includes a dedicated pyrhubarb-mcp package that provides a FastMCP server exposing all document and video understanding capabilities through the Model Context Protocol (MCP). This allows seamless integration with MCP-compatible AI assistants like Cline, Claude Desktop, and other MCP clients.
What is MCP?¶
The Model Context Protocol (MCP) is an open standard for connecting AI assistants to external tools and data sources. MCP enables:
Universal Tool Access: AI assistants can use external tools consistently
Secure Integration: Controlled access to resources and capabilities
Extensible Architecture: Easy addition of new tools and resources
Rhubarb’s MCP server provides native access to all document processing capabilities without requiring separate API integration.
Features¶
Tools (8 Available)¶
Tool Name |
Description |
Capabilities |
---|---|---|
|
Multi-modal document analysis |
Q&A, summarization, structured extraction |
|
Streaming conversations |
Chat with conversation history |
|
Entity recognition |
50+ built-in entities + custom entities + PII |
|
AI schema generation |
Custom JSON schemas for extraction |
|
Vector sample creation |
Document classification training |
|
Document classification |
Classify using pre-trained samples |
|
Sample inspection |
View classification details |
|
Video understanding |
Nova model video analysis |
Resources (4 Available)¶
Resource URI |
Description |
---|---|
|
List of 50+ built-in entity types for Named Entity Recognition |
|
Supported Bedrock models and their capabilities |
|
Built-in schemas for common document processing use cases |
|
Classification sample details including classes and sample counts |
Installation & Setup¶
Prerequisites¶
Python 3.11+ - Required for the MCP server
AWS Credentials - Required for Amazon Bedrock and S3 access
pipx or uvx - For auto-installing the MCP server
No manual installation is required - the MCP server auto-installs when first used.
Quick Start¶
No installation required - The MCP server auto-installs when first used through uvx or pipx
Test the server (optional):
uvx pyrhubarb-mcp@latest --check-deps
Configure AWS credentials via environment variables:
# With AWS Profile AWS_PROFILE=your-profile uvx pyrhubarb-mcp@latest # With Access Keys AWS_ACCESS_KEY_ID=AKIA... AWS_SECRET_ACCESS_KEY=secret... uvx pyrhubarb-mcp@latest
The server will start automatically when your MCP client connects to it.
Configuration¶
Environment Variables¶
The MCP server is configured through environment variables:
Variable |
Description |
Default |
---|---|---|
|
AWS region for Bedrock and S3 access |
|
|
AWS profile name to use |
None |
|
AWS access key ID |
None |
|
AWS secret access key |
None |
|
Enable cross-region inference |
|
|
Default model to use |
|
|
Default S3 bucket for classification samples |
None |
Command-Line Arguments¶
The server supports additional configuration through command-line arguments:
pyrhubarb-mcp --help
Available arguments:
--aws-region
- AWS region (overrides environment)--enable-cri
- Enable cross-region inference--default-model
- Default model selection with validation--default-bucket
- Default S3 bucket for classification samples--check-deps
- Check all dependencies and AWS credentials
MCP Client Configuration¶
The MCP server integrates with various MCP-compatible clients. Here are configuration examples for popular clients:
Cline Integration¶
Add to your Cline MCP settings using uvx (no pre-installation required):
Option 1: Using AWS Profile (Recommended)
{
"rhubarb": {
"command": "uvx",
"args": [
"pyrhubarb-mcp@latest",
"--aws-region", "us-east-1",
"--default-model", "claude-sonnet"
],
"env": {
"AWS_PROFILE": "your-aws-profile"
}
}
}
Option 2: Using AWS Access Keys
{
"rhubarb": {
"command": "uvx",
"args": [
"pyrhubarb-mcp@latest",
"--aws-region", "us-west-2",
"--default-model", "claude-sonnet"
],
"env": {
"AWS_ACCESS_KEY_ID": "AKIAIOSFODNN7EXAMPLE",
"AWS_SECRET_ACCESS_KEY": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}
}
}
Option 3: Advanced Configuration
{
"rhubarb": {
"command": "uvx",
"args": [
"pyrhubarb-mcp@latest",
"--aws-region", "us-west-2",
"--default-model", "nova-pro",
"--enable-cri",
"--default-bucket", "my-classification-bucket"
],
"env": {
"AWS_PROFILE": "production"
}
}
}
Claude Desktop Integration¶
Add to your Claude Desktop configuration:
{
"mcpServers": {
"rhubarb": {
"command": "uvx",
"args": [
"pyrhubarb-mcp@latest",
"--default-model", "claude-sonnet"
],
"env": {
"AWS_PROFILE": "your-aws-profile"
}
}
}
}
Alternative with pipx¶
You can also use pipx instead of uvx:
{
"rhubarb": {
"command": "pipx",
"args": [
"run", "pyrhubarb-mcp@latest",
"--default-model", "claude-sonnet"
],
"env": {
"AWS_PROFILE": "your-profile"
}
}
}
Usage Examples¶
Once configured, you can use Rhubarb’s capabilities directly through your MCP client:
Document Analysis¶
Ask your AI assistant to analyze a document:
"Use the analyze_document tool to analyze this PDF: s3://my-bucket/report.pdf
Ask: 'What are the key findings in this report?'"
The tool will return structured analysis results from the document.
Entity Extraction¶
Extract entities from documents:
"Use the extract_entities tool on ./invoice.pdf to find all ORGANIZATION,
MONEY, and DATE entities, plus any PII information"
Video Analysis¶
Analyze video content:
"Use the analyze_video tool on s3://my-bucket/presentation.mp4 to summarize
the key points from this presentation using the nova-pro model"
Document Classification¶
Classify documents using pre-trained samples:
"First, use create_classification_samples with ./training_manifest.csv
and bucket my-classification-bucket. Then classify ./unknown_document.pdf
using those samples."
Advanced Features¶
Conversation Memory¶
The server maintains conversation history for document chat sessions:
"Start a streaming chat with ./document.pdf about the financial data"
# Follow-up questions will maintain context
Schema Generation¶
Generate custom extraction schemas:
"Use generate_extraction_schema on ./sample_contract.pdf to create a schema
for extracting: parties involved, contract dates, payment terms, and obligations"
Resource Discovery¶
Access built-in resources for information:
"Show me the rhubarb://entities/built-in resource to see all available entity types"
"Check rhubarb://models/supported for available models and their capabilities"
Supported Models¶
The MCP server supports all Rhubarb-compatible models:
Model |
Documents |
Video |
Use Cases |
---|---|---|---|
|
✅ |
❌ |
Complex reasoning, detailed analysis |
|
✅ |
❌ |
Balanced performance and cost |
|
✅ |
❌ |
Legacy compatibility |
|
✅ |
❌ |
Latest Claude features |
|
✅ |
❌ |
Enhanced capabilities |
|
✅ |
❌ |
Fast, lightweight tasks |
|
✅ |
✅ |
High-quality video analysis |
|
✅ |
✅ |
Cost-effective video processing |
Architecture¶
The MCP server provides a clean integration layer:
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ MCP Client │────│ FastMCP Server │────│ Rhubarb Core │
│ (Cline, etc.) │ │ (Python) │ │ (Python) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
│ │
┌──────────────────┐ ┌─────────────────┐
│ Conversation │ │ Amazon Bedrock │
│ Memory │ │ (Claude, Nova) │
└──────────────────┘ └─────────────────┘
│
┌─────────────────┐
│ Amazon S3 │
│ (Documents, etc.)│
└─────────────────┘
Key Benefits:
Native Python: No external bridge, direct Rhubarb integration
Full Feature Parity: All Rhubarb capabilities exposed through MCP
Conversation Memory: Maintains chat history across interactions
Resource Discovery: Built-in resources for entities, models, schemas
Error Handling: Comprehensive error reporting and validation
Troubleshooting¶
Common Issues¶
“FastMCP not installed”:
poetry install # Reinstall dependencies
“AWS credentials not configured”:
export AWS_PROFILE=your-profile
# or
export AWS_ACCESS_KEY_ID=your-key AWS_SECRET_ACCESS_KEY=your-secret
“Bedrock access denied”
Ensure your AWS credentials have Amazon Bedrock permissions
Verify the region supports the requested models
Check your AWS account has access to Bedrock models
“Video analysis requires S3”
Video files must be stored in Amazon S3
Use
s3://bucket/video.mp4
formatEnsure S3 bucket permissions allow access
Debug Mode¶
Run with verbose output for troubleshooting:
DEBUG=1 pyrhubarb-mcp
This will provide detailed logging for diagnosis.
Dependency Checking¶
Verify your setup with the dependency checker:
pyrhubarb-mcp --check-deps
This checks: - FastMCP installation - Rhubarb core modules - AWS credential configuration - Boto3 availability - AWS account access
Performance Considerations¶
Large Documents: Use
sliding_window_overlap
parameter for documents >20 pagesVideo Processing: Nova models have frame limits (default: 20 frames)
Classification: Vector samples are cached in S3 for fast retrieval
Memory Management: Conversations are kept in memory (restart server to clear)
Security¶
Credential Handling: AWS credentials are never logged or exposed
S3 Access: Ensure proper bucket permissions and access policies
Input Validation: All tool inputs are validated before processing
Error Handling: Sensitive information is filtered from error messages
For production deployments, follow AWS security best practices for credential management and access control.