Troubleshooting Questions

Common issues and solutions when using the GenAI IDP Accelerator.

Permission Errors

"AccessDenied" when deploying with Terraform

Symptoms:

Error: AccessDenied: User is not authorized to perform action

Solutions:

Check IAM permissions:

aws sts get-caller-identity
aws iam get-user

Verify required permissions:
IAM: Create/manage roles and policies
Lambda: Create/update functions
S3: Create/manage buckets
DynamoDB: Create/manage tables
API Gateway: Create/manage APIs
Use administrator access temporarily:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "*",
      "Resource": "*"
    }
  ]
}

"User is not authorized to invoke Bedrock model"

Symptoms:

Error: AccessDeniedException: Your account is not authorized to invoke this model

Solutions:

Request model access:
Go to Amazon Bedrock console
Navigate to "Model access"
Request access for required models
Check region availability:
Bedrock models aren't available in all regions
Use supported regions like us-east-1, us-west-2
Verify model ARN:

# Correct model ARN format
model_id = "anthropic.claude-3-sonnet-20240229-v1:0"

Resource Limits

"LimitExceededException" errors

Symptoms:

Error: LimitExceededException: Account has reached the maximum number of functions

Solutions:

Check service quotas:

aws service-quotas get-service-quota \
  --service-code lambda \
  --quota-code L-B99A9384

Request quota increases:
Go to AWS Service Quotas console
Find the relevant service and quota
Submit increase request
Clean up unused resources:

# List unused Lambda functions
aws lambda list-functions --query 'Functions[?LastModified<`2024-01-01`]'

"ThrottlingException" from AWS services

Symptoms:

Error: ThrottlingException: Rate exceeded

Solutions:

Implement exponential backoff:

import time
import random

def retry_with_backoff(func, max_retries=3):
    for attempt in range(max_retries):
        try:
            return func()
        except ThrottlingException:
            if attempt == max_retries - 1:
                raise
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(wait_time)

Reduce request rate:
Add delays between API calls
Use batch operations where possible
Implement queue-based processing

Deployment Failures

Terraform state lock conflicts

Symptoms:

Error: Error acquiring the state lock

Solutions:

Wait for lock release:
Another Terraform operation may be running
Wait 10-15 minutes for automatic release
Check lock status:

aws dynamodb get-item \
  --table-name terraform-locks \
  --key '{"LockID":{"S":"your-state-file"}}'

Force unlock (use carefully):

terraform force-unlock LOCK_ID

"Resource already exists" errors

Symptoms:

Error: ResourceAlreadyExistsException: Resource already exists

Solutions:

Import existing resource:

terraform import aws_s3_bucket.documents existing-bucket-name

Use different resource names:

resource "aws_s3_bucket" "documents" {
  bucket = "${var.environment}-idp-documents-${random_id.suffix.hex}"
}

Check for naming conflicts:
S3 bucket names must be globally unique
Lambda function names must be unique per region/account

Runtime Errors

Lambda function timeouts

Symptoms:

Task timed out after 15.00 seconds

Solutions:

Increase timeout:

resource "aws_lambda_function" "processor" {
  timeout = 300  # 5 minutes
}

Optimize function performance:

# Initialize clients outside handler
import boto3

s3_client = boto3.client('s3')
textract_client = boto3.client('textract')

def lambda_handler(event, context):
    # Use pre-initialized clients
    pass

Increase memory allocation:

resource "aws_lambda_function" "processor" {
  memory_size = 1024  # More memory = more CPU
}

"Document format not supported" errors

Symptoms:

Error: InvalidParameterException: Document format not supported

Solutions:

Check supported formats:
PDF, PNG, JPEG, TIFF only
Maximum file size: 10MB (sync), 500MB (async)
Validate file before processing:

import os

SUPPORTED_EXTENSIONS = {'.pdf', '.png', '.jpg', '.jpeg', '.tiff'}

def validate_document(file_name):
    ext = os.path.splitext(file_name)[1].lower()
    return ext in SUPPORTED_EXTENSIONS

Convert unsupported formats:
Use Lambda layers with image processing libraries
Convert to supported format before processing

Performance Issues

Slow document processing

Symptoms:

Long processing times
Frequent timeouts

Solutions:

Optimize Lambda configuration:

resource "aws_lambda_function" "processor" {
  memory_size = 2048  # Higher memory for better performance
  timeout     = 900   # 15 minutes max
}

Implement parallel processing:

import concurrent.futures

def process_documents_parallel(documents):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(process_document, doc) for doc in documents]
        results = [future.result() for future in futures]
    return results

Use asynchronous processing:
Use Textract async APIs for large documents
Implement Step Functions for complex workflows
Use SQS for queue-based processing

High memory usage

Symptoms:

Runtime.OutOfMemoryError: JavaScript heap out of memory

Solutions:

Increase Lambda memory:

memory_size = 3008  # Maximum available

Optimize memory usage:

# Process documents in chunks
def process_large_document(document_text):
    chunk_size = 10000  # characters
    chunks = [document_text[i:i+chunk_size] 
             for i in range(0, len(document_text), chunk_size)]

    results = []
    for chunk in chunks:
        result = process_chunk(chunk)
        results.append(result)

    return combine_results(results)

Network Issues

VPC connectivity problems

Symptoms:

Lambda functions can't reach AWS services
Timeout errors when calling APIs

Solutions:

Check VPC configuration:

# Ensure NAT Gateway for private subnets
resource "aws_nat_gateway" "main" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public[0].id
}

Use VPC endpoints:

resource "aws_vpc_endpoint" "s3" {
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.${var.region}.s3"
}

Check security groups:

resource "aws_security_group" "lambda" {
  egress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Data Issues

Inconsistent processing results

Symptoms:

Different results for same document
Missing or incorrect extracted data

Solutions:

Improve prompt consistency:

CONSISTENT_PROMPT = """
Extract the following information from this document:
1. Document type
2. Date (format: YYYY-MM-DD)
3. Amount (format: $X.XX)
4. Parties involved

Return as JSON with these exact keys: type, date, amount, parties
"""

Implement validation:

def validate_extraction_result(result):
    required_fields = ['type', 'date', 'amount', 'parties']
    return all(field in result for field in required_fields)

Add error handling:

def process_with_fallback(document):
    try:
        result = primary_processing(document)
        if validate_result(result):
            return result
    except Exception as e:
        logger.warning(f"Primary processing failed: {e}")

    # Fallback to simpler processing
    return fallback_processing(document)

Monitoring and Debugging

How to debug Lambda functions

Enable detailed logging:

import logging

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

def lambda_handler(event, context):
    logger.debug(f"Received event: {event}")
    # ... processing logic
    logger.debug(f"Processing result: {result}")

Use X-Ray tracing:

resource "aws_lambda_function" "processor" {
  tracing_config {
    mode = "Active"
  }
}

Monitor CloudWatch metrics:
Duration, Errors, Throttles
Memory utilization
Concurrent executions

How to trace API requests

Enable API Gateway logging:

resource "aws_api_gateway_stage" "main" {
  xray_tracing_enabled = true

  access_log_settings {
    destination_arn = aws_cloudwatch_log_group.api_gateway.arn
    format = jsonencode({
      requestId      = "$context.requestId"
      ip            = "$context.identity.sourceIp"
      caller        = "$context.identity.caller"
      user          = "$context.identity.user"
      requestTime   = "$context.requestTime"
      httpMethod    = "$context.httpMethod"
      resourcePath  = "$context.resourcePath"
      status        = "$context.status"
      protocol      = "$context.protocol"
      responseLength = "$context.responseLength"
    })
  }
}

Use correlation IDs:

import uuid

def lambda_handler(event, context):
    correlation_id = str(uuid.uuid4())
    logger.info(f"Processing request {correlation_id}")

    # Pass correlation ID through processing chain
    result = process_document(event, correlation_id)

    return {
        'statusCode': 200,
        'headers': {'X-Correlation-ID': correlation_id},
        'body': json.dumps(result)
    }

For more troubleshooting help, see: