Using Cross-Region Inference

When running model inference in on-demand mode, your requests might be restricted by service quotas or during peak usage times. Cross-region inference enables you to seamlessly manage unplanned traffic bursts by utilizing compute across different AWS Regions. With cross-region inference, you can distribute traffic across multiple AWS Regions, enabling higher throughput.

Basic Usage

Here’s a basic example of how to enable Cross-Region Inference:

 from rhubarb import DocAnalysis, SystemPrompts
 import boto3

 # Initialize a boto3 session
 session = boto3.Session()

 # Create a DocAnalysis instance with cross-region inference
 da = DocAnalysis(
     file_path="./test_docs/employee_enrollment.pdf",
     boto3_session=session,
     enable_cri=True,  # Enable cross-region inference
     system_prompt=SystemPrompts().SummarySysPrompt
 )

 # Run document analysis
 resp = da.run(message="Give me a brief summary of this document.")
 print(resp)

Configuration Options

DocAnalysis Parameters

  • file_path: Path to the document for analysis

  • boto3_session: AWS boto3 session for authentication

  • enable_cri: Boolean flag to enable cross-region inference (default: False)

  • system_prompt: System prompt to guide the analysis

  • target_region: (Optional) Specify a target AWS region for inference