Inference on EKS

AI on EKS provides comprehensive solutions for deploying AI/ML inference workloads on Amazon EKS, supporting both GPU and AWS Neuron (Inferentia/Trainium) hardware configurations.

Quick Start Options

🚀 Inference Charts (Recommended)

Get started quickly with our pre-configured Helm charts that support multiple models and deployment patterns:

Inference Charts - Streamlined Helm-based deployments with pre-configured values for popular models
Supports both GPU and Neuron hardware
Includes VLLM and Ray-VLLM frameworks
Pre-configured for 10+ popular models including Llama, DeepSeek, and Mistral

Hardware-Specific Guides

GPU Deployments

Explore GPU-specific inference solutions:

Neuron Deployments (AWS Inferentia)

Leverage AWS Inferentia chips for cost-effective inference:

Architecture Overview

AI on EKS inference solutions support multiple deployment patterns:

Single-node inference with vLLM
Distributed inference with Ray-vLLM
Production-ready deployments with load balancing
Auto-scaling capabilities
Observability and monitoring integration

Choosing the Right Approach

Use Case	Recommended Solution	Benefits
Quick prototyping	Inference Charts	Pre-configured, fast deployment
GPU	GPU-specific guides	GPU-based inference
Neuron	Neuron guides	Inferentia-based inference

Next Steps

Start with Inference Charts for the fastest path to deployment
Explore hardware-specific guides for optimized configurations
Set up monitoring and observability for production workloads

Quick Start Options​

🚀 Inference Charts (Recommended)​

Hardware-Specific Guides​

GPU Deployments​

Neuron Deployments (AWS Inferentia)​

Architecture Overview​

Choosing the Right Approach​

Next Steps​