Test Scenarios

This section provides practical test scenarios for benchmarking LLM inference performance. Each scenario addresses specific testing objectives and use cases.

Available Scenarios

Choosing Between Synthetic and Real Dataset Testing

Understand when to use synthetic vs. real-world data for benchmarking and best practices for dataset selection.

Scenario 1: Baseline Performance

Establish your system's optimal performance with zero contention. Ideal for understanding the best-case performance without queueing or resource competition.

Use when:

Just deployed a new endpoint
Made infrastructure changes
Need a clean reference point for optimization

Scenario 2: Saturation Testing

Determine maximum sustainable throughput before performance degrades through multi-stage load testing.

Use when:

Planning capacity
Setting autoscaling thresholds
Validating before production launch

Scenario 3: Automatic Saturation Detection

Use sweep mode for automated capacity discovery without manual QPS guessing.

Use when:

Initial deployments
CI/CD pipelines
Quick capacity re-validation

Scenario 4: Production Simulation

Replicate real-world traffic with variable request sizes and bursty (Poisson) arrivals.

Use when:

Final validation before launch
Setting SLA targets
Validating realistic workload handling

Scenario 5: Real Dataset Testing

Validate production-ready performance using actual user prompts and query patterns.

Use when:

Model fine-tuned for specific patterns
Comparing model versions
Need authentic performance guarantees

Prerequisites

All scenarios use the AI on EKS Benchmark Helm Chart for deployment. Before proceeding:

Install Helm (version 3.x or later)

Add the AI on EKS Helm repository:

helm repo add ai-on-eks https://awslabs.github.io/ai-on-eks-charts/
helm repo update

Configure kubectl access to your EKS cluster
Deploy your inference service (e.g., vLLM serving your model)

Implementation Notes

Each scenario below demonstrates deployment using the Helm chart as the recommended method. The chart provides:

Consistent configuration across all test scenarios
Values-driven customization for specific use cases
Production-ready defaults with pod affinity and resource management
Easy maintenance with centralized configuration

For educational purposes or custom deployments, each scenario also includes a collapsible section with raw Kubernetes YAML showing the complete manifest structure. This alternative approach uses runtime dependency installation where dependencies are installed in the main container at startup.

Available Scenarios​

Choosing Between Synthetic and Real Dataset Testing​

Scenario 1: Baseline Performance​

Scenario 2: Saturation Testing​

Scenario 3: Automatic Saturation Detection​

Scenario 4: Production Simulation​

Scenario 5: Real Dataset Testing​

Prerequisites​

Implementation Notes​

Available Scenarios

Choosing Between Synthetic and Real Dataset Testing

Scenario 1: Baseline Performance

Scenario 2: Saturation Testing

Scenario 3: Automatic Saturation Detection

Scenario 4: Production Simulation

Scenario 5: Real Dataset Testing

Prerequisites

Implementation Notes