MCP Server Integration¶

Rhubarb includes a dedicated pyrhubarb-mcp package that provides a FastMCP server exposing all document and video understanding capabilities through the Model Context Protocol (MCP). This allows seamless integration with MCP-compatible AI assistants like Cline, Claude Desktop, and other MCP clients.

What is MCP?¶

The Model Context Protocol (MCP) is an open standard for connecting AI assistants to external tools and data sources. MCP enables:

Universal Tool Access: AI assistants can use external tools consistently
Secure Integration: Controlled access to resources and capabilities
Extensible Architecture: Easy addition of new tools and resources

Rhubarb’s MCP server provides native access to all document processing capabilities without requiring separate API integration.

Features¶

Tools (8 Available)¶

Available MCP Tools¶
Tool Name	Description	Capabilities
`analyze_document`	Multi-modal document analysis	Q&A, summarization, structured extraction
`stream_document_chat`	Streaming conversations	Chat with conversation history
`extract_entities`	Entity recognition	50+ built-in entities + custom entities + PII
`generate_extraction_schema`	AI schema generation	Custom JSON schemas for extraction
`create_classification_samples`	Vector sample creation	Document classification training
`classify_document`	Document classification	Classify using pre-trained samples
`view_classification_sample`	Sample inspection	View classification details
`analyze_video`	Video understanding	Nova model video analysis

Resources (4 Available)¶

Available MCP Resources¶
Resource URI	Description
`rhubarb://entities/built-in`	List of 50+ built-in entity types for Named Entity Recognition
`rhubarb://models/supported`	Supported Bedrock models and their capabilities
`rhubarb://schemas/built-in/{type}`	Built-in schemas for common document processing use cases
`rhubarb://classification-samples/{bucket}/{id}`	Classification sample details including classes and sample counts

Installation & Setup¶

Prerequisites¶

Python 3.11+ - Required for the MCP server
AWS Credentials - Required for Amazon Bedrock and S3 access
pipx or uvx - For auto-installing the MCP server

No manual installation is required - the MCP server auto-installs when first used.

Quick Start¶

No installation required - The MCP server auto-installs when first used through uvx or pipx
Test the server (optional):
```
uvx pyrhubarb-mcp@latest --check-deps
```

Configure AWS credentials via environment variables:

# With AWS Profile
AWS_PROFILE=your-profile uvx pyrhubarb-mcp@latest

# With Access Keys
AWS_ACCESS_KEY_ID=AKIA... AWS_SECRET_ACCESS_KEY=secret... uvx pyrhubarb-mcp@latest

The server will start automatically when your MCP client connects to it.

Configuration¶

Environment Variables¶

The MCP server is configured through environment variables:

Environment Variables¶
Variable	Description	Default
`AWS_REGION`	AWS region for Bedrock and S3 access	`us-east-1`
`AWS_PROFILE`	AWS profile name to use	None
`AWS_ACCESS_KEY_ID`	AWS access key ID	None
`AWS_SECRET_ACCESS_KEY`	AWS secret access key	None
`RHUBARB_ENABLE_CRI`	Enable cross-region inference	`false`
`RHUBARB_DEFAULT_MODEL`	Default model to use	`claude-sonnet`
`RHUBARB_DEFAULT_BUCKET`	Default S3 bucket for classification samples	None

Command-Line Arguments¶

The server supports additional configuration through command-line arguments:

pyrhubarb-mcp --help

Available arguments:

--aws-region - AWS region (overrides environment)
--enable-cri - Enable cross-region inference
--default-model - Default model selection with validation
--default-bucket - Default S3 bucket for classification samples
--check-deps - Check all dependencies and AWS credentials

MCP Client Configuration¶

The MCP server integrates with various MCP-compatible clients. Here are configuration examples for popular clients:

Cline Integration¶

Add to your Cline MCP settings using uvx (no pre-installation required):

Option 1: Using AWS Profile (Recommended)

{
  "rhubarb": {
    "command": "uvx",
    "args": [
      "pyrhubarb-mcp@latest",
      "--aws-region", "us-east-1",
      "--default-model", "claude-sonnet"
    ],
    "env": {
      "AWS_PROFILE": "your-aws-profile"
    }
  }
}

Option 2: Using AWS Access Keys

{
  "rhubarb": {
    "command": "uvx",
    "args": [
      "pyrhubarb-mcp@latest",
      "--aws-region", "us-west-2",
      "--default-model", "claude-sonnet"
    ],
    "env": {
      "AWS_ACCESS_KEY_ID": "AKIAIOSFODNN7EXAMPLE",
      "AWS_SECRET_ACCESS_KEY": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
    }
  }
}

Option 3: Advanced Configuration

{
  "rhubarb": {
    "command": "uvx",
    "args": [
      "pyrhubarb-mcp@latest",
      "--aws-region", "us-west-2",
      "--default-model", "nova-pro",
      "--enable-cri",
      "--default-bucket", "my-classification-bucket"
    ],
    "env": {
      "AWS_PROFILE": "production"
    }
  }
}

Claude Desktop Integration¶

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "rhubarb": {
      "command": "uvx",
      "args": [
        "pyrhubarb-mcp@latest",
        "--default-model", "claude-sonnet"
      ],
      "env": {
        "AWS_PROFILE": "your-aws-profile"
      }
    }
  }
}

Alternative with pipx¶

You can also use pipx instead of uvx:

{
  "rhubarb": {
    "command": "pipx",
    "args": [
      "run", "pyrhubarb-mcp@latest",
      "--default-model", "claude-sonnet"
    ],
    "env": {
      "AWS_PROFILE": "your-profile"
    }
  }
}

Usage Examples¶

Once configured, you can use Rhubarb’s capabilities directly through your MCP client:

Document Analysis¶

Ask your AI assistant to analyze a document:

"Use the analyze_document tool to analyze this PDF: s3://my-bucket/report.pdf
Ask: 'What are the key findings in this report?'"

The tool will return structured analysis results from the document.

Entity Extraction¶

Extract entities from documents:

"Use the extract_entities tool on ./invoice.pdf to find all ORGANIZATION,
MONEY, and DATE entities, plus any PII information"

Video Analysis¶

Analyze video content:

"Use the analyze_video tool on s3://my-bucket/presentation.mp4 to summarize
the key points from this presentation using the nova-pro model"

Document Classification¶

Classify documents using pre-trained samples:

"First, use create_classification_samples with ./training_manifest.csv
and bucket my-classification-bucket. Then classify ./unknown_document.pdf
using those samples."

Advanced Features¶

Conversation Memory¶

The server maintains conversation history for document chat sessions:

"Start a streaming chat with ./document.pdf about the financial data"
# Follow-up questions will maintain context

Schema Generation¶

Generate custom extraction schemas:

"Use generate_extraction_schema on ./sample_contract.pdf to create a schema
for extracting: parties involved, contract dates, payment terms, and obligations"

Resource Discovery¶

Access built-in resources for information:

"Show me the rhubarb://entities/built-in resource to see all available entity types"
"Check rhubarb://models/supported for available models and their capabilities"

Supported Models¶

The MCP server supports all Rhubarb-compatible models:

Model Support¶
Model	Documents	Video	Use Cases
`claude-opus`	✅	❌	Complex reasoning, detailed analysis
`claude-sonnet`	✅	❌	Balanced performance and cost
`claude-sonnet-v1`	✅	❌	Legacy compatibility
`claude-sonnet-v2`	✅	❌	Latest Claude features
`claude-sonnet-37`	✅	❌	Enhanced capabilities
`claude-haiku`	✅	❌	Fast, lightweight tasks
`nova-pro`	✅	✅	High-quality video analysis
`nova-lite`	✅	✅	Cost-effective video processing

Architecture¶

The MCP server provides a clean integration layer:

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   MCP Client    │────│  FastMCP Server  │────│  Rhubarb Core   │
│  (Cline, etc.)  │    │   (Python)       │    │   (Python)      │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                               │                         │
                               │                         │
                       ┌──────────────────┐    ┌─────────────────┐
                       │   Conversation   │    │  Amazon Bedrock │
                       │     Memory       │    │   (Claude, Nova) │
                       └──────────────────┘    └─────────────────┘
                                                        │
                                               ┌─────────────────┐
                                               │   Amazon S3     │
                                               │ (Documents, etc.)│
                                               └─────────────────┘

Key Benefits:

Native Python: No external bridge, direct Rhubarb integration
Full Feature Parity: All Rhubarb capabilities exposed through MCP
Conversation Memory: Maintains chat history across interactions
Resource Discovery: Built-in resources for entities, models, schemas
Error Handling: Comprehensive error reporting and validation

Troubleshooting¶

Common Issues¶

“FastMCP not installed”:

poetry install  # Reinstall dependencies

“AWS credentials not configured”:

export AWS_PROFILE=your-profile
# or
export AWS_ACCESS_KEY_ID=your-key AWS_SECRET_ACCESS_KEY=your-secret

“Bedrock access denied”

Ensure your AWS credentials have Amazon Bedrock permissions
Verify the region supports the requested models
Check your AWS account has access to Bedrock models

“Video analysis requires S3”

Video files must be stored in Amazon S3
Use s3://bucket/video.mp4 format
Ensure S3 bucket permissions allow access

Debug Mode¶

Run with verbose output for troubleshooting:

DEBUG=1 pyrhubarb-mcp

This will provide detailed logging for diagnosis.

Dependency Checking¶

Verify your setup with the dependency checker:

pyrhubarb-mcp --check-deps

This checks: - FastMCP installation - Rhubarb core modules - AWS credential configuration - Boto3 availability - AWS account access

Performance Considerations¶

Large Documents: Use sliding_window_overlap parameter for documents >20 pages
Video Processing: Nova models have frame limits (default: 20 frames)
Classification: Vector samples are cached in S3 for fast retrieval
Memory Management: Conversations are kept in memory (restart server to clear)

Security¶

Credential Handling: AWS credentials are never logged or exposed
S3 Access: Ensure proper bucket permissions and access policies
Input Validation: All tool inputs are validated before processing
Error Handling: Sensitive information is filtered from error messages

For production deployments, follow AWS security best practices for credential management and access control.