Troubleshooting Guide
This guide helps you diagnose and resolve common issues when deploying the GenAI IDP Accelerator with Terraform.
Common Deployment Issues
Permission Errors
Symptom: Access denied errors during Terraform deployment
Error: AccessDenied: User is not authorized to perform action
Solutions:
- Check IAM Permissions: Ensure your AWS credentials have the required permissions
- Verify Service Roles: Check that service-linked roles exist for required services
- Review Resource Policies: Ensure bucket policies and resource policies allow access
# Check current AWS identity
aws sts get-caller-identity
# Verify required permissions
aws iam simulate-principal-policy \
--policy-source-arn arn:aws:iam::ACCOUNT:user/USERNAME \
--action-names s3:CreateBucket,lambda:CreateFunction
Resource Limits
Symptom: Service quotas exceeded
Error: LimitExceededException: Account has reached the maximum number of functions
Solutions:
- Check Service Quotas: Review current usage in AWS Console
- Request Quota Increases: Submit requests through AWS Support
- Clean Up Unused Resources: Remove old deployments
# Check Lambda function count
aws lambda list-functions --query 'Functions[].FunctionName' --output table
# Check S3 bucket count
aws s3api list-buckets --query 'Buckets[].Name' --output table
Deployment Failures
Symptom: Terraform apply fails with resource creation errors
Common Causes:
- Network Issues: VPC/subnet configuration problems
- Dependency Issues: Resources created in wrong order
- Configuration Errors: Invalid parameter values
Debugging Steps:
- Enable Detailed Logging:
export TF_LOG=DEBUG
terraform apply
- Check Resource Dependencies:
terraform graph | dot -Tpng > dependency-graph.png
- Validate Configuration:
terraform validate
terraform plan -detailed-exitcode
Service-Specific Issues
Amazon Bedrock
Issue: Model access denied
Error: AccessDeniedException: Your account is not authorized to invoke this model
Solution: Request model access in Bedrock console
- Go to Amazon Bedrock console
- Navigate to Model access
- Request access for required models (Claude, Titan, etc.)
Amazon Textract
Issue: Document processing failures
Error: InvalidParameterException: Document format not supported
Solutions:
- Verify document format (PDF, PNG, JPEG, TIFF)
- Check document size limits (10MB for synchronous, 500MB for asynchronous)
- Ensure proper S3 permissions for document access
Lambda Functions
Issue: Function timeout or memory errors
Error: Task timed out after 15.00 seconds
Solutions:
- Increase Timeout:
resource "aws_lambda_function" "processor" {
timeout = 300 # 5 minutes
memory_size = 1024 # 1GB
}
- Optimize Code: Review function logic for efficiency
- Use Step Functions: For long-running processes
State Management Issues
State Lock Conflicts
Issue: Terraform state is locked
Error: Error acquiring the state lock
Solutions:
- Wait for Lock Release: Another operation may be in progress
- Force Unlock (use carefully):
terraform force-unlock LOCK_ID
- Check DynamoDB Table: Verify state lock table exists and is accessible
State Corruption
Issue: State file corruption or inconsistency
Solutions:
- Import Existing Resources:
terraform import aws_s3_bucket.example bucket-name
- Refresh State:
terraform refresh
- Restore from Backup: Use versioned S3 backend
Network and Security Issues
VPC Configuration
Issue: Resources cannot communicate
Checklist:
- [ ] Subnets in correct AZs
- [ ] Route tables configured
- [ ] Security groups allow required traffic
- [ ] NACLs not blocking traffic
- [ ] NAT Gateway for private subnets
Security Group Rules
Issue: Connection timeouts
Debug Steps:
- Check Security Group Rules:
aws ec2 describe-security-groups --group-ids sg-12345678
- Test Connectivity:
# From EC2 instance
telnet target-host 443
- Review VPC Flow Logs: Check for rejected connections
Performance Issues
Slow Processing
Symptoms:
- Long document processing times
- Lambda function timeouts
- High costs
Optimization Strategies:
- Parallel Processing: Use Step Functions for concurrent execution
- Batch Processing: Process multiple documents together
- Caching: Store processed results to avoid reprocessing
- Right-sizing: Adjust Lambda memory and timeout settings
Cost Optimization
High Cost Indicators:
- Excessive Lambda invocations
- Large S3 storage costs
- High Bedrock API usage
Cost Reduction Tips:
- Implement Caching: Avoid duplicate processing
- Use Lifecycle Policies: Archive old documents
- Monitor Usage: Set up billing alerts
- Optimize Models: Use smaller models when appropriate
Monitoring and Debugging
CloudWatch Logs
Key Log Groups to Monitor:
/aws/lambda/idp-processor-*/aws/stepfunctions/idp-workflow/aws/apigateway/idp-api
Useful Log Queries:
-- Find errors in last hour
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 100
X-Ray Tracing
Enable Tracing:
resource "aws_lambda_function" "processor" {
tracing_config {
mode = "Active"
}
}
Analyze Traces: Look for bottlenecks and errors in service map
Getting Help
AWS Support
For critical issues:
- Create Support Case: Include error messages and logs
- Provide Context: Terraform configuration and deployment details
- Include Diagnostics: CloudWatch logs and X-Ray traces
Community Resources
- AWS Forums: Search for similar issues
- GitHub Issues: Check project repository for known issues
- Documentation: Review AWS service documentation
Emergency Procedures
Critical System Issues:
- Rollback: Use previous Terraform state
- Scale Down: Reduce resource usage
- Enable Monitoring: Increase logging verbosity
- Contact Support: Open high-priority support case
Prevention Best Practices
Pre-Deployment Checks
- [ ] Run
terraform planand review changes - [ ] Test in development environment first
- [ ] Verify IAM permissions
- [ ] Check service quotas
- [ ] Review security configurations
Monitoring Setup
- [ ] CloudWatch alarms for key metrics
- [ ] Log aggregation and analysis
- [ ] Cost monitoring and alerts
- [ ] Performance baseline establishment
Documentation
- [ ] Document custom configurations
- [ ] Maintain runbooks for common issues
- [ ] Keep architecture diagrams updated
- [ ] Record lessons learned
For additional help, see our FAQ section or contact support.