Agents on EKS

The Agents on EKS infrastructure deploys an environment that supports continuously building, deploying, and evaluating AI agents using open source tools in a secure, scalable, and reliable manner.

Why?

Building and operating AI agents at scale requires more than just inference infrastructure. Agents need:

Source control and CI/CD for versioning agent code and configurations
Observability for tracing agent behavior, evaluating performance, and debugging issues
Persistent memory for storing embeddings and enabling retrieval-augmented generation (RAG)
Tool orchestration for managing and discovering MCP (Model Context Protocol) servers

This infrastructure brings together these components into a cohesive platform, enabling teams to iterate quickly on agent development while maintaining reliability.

Use Cases

Agent Development: Build and test AI agents with integrated source control and CI/CD pipelines
Agent Evaluation: Use Langfuse to trace agent executions, evaluate outputs, and track performance over time
RAG Applications: Store and retrieve embeddings using Milvus for knowledge-augmented agents
MCP Tool Management: Discover and manage MCP servers through the gateway registry
Multi-Agent Systems: Deploy and orchestrate multiple agents with shared infrastructure

Architecture

This infrastructure creates:

Amazon VPC with public and private subnets across multiple availability zones
Amazon EKS Cluster with managed node groups for critical addons
Karpenter for intelligent node autoscaling based on workload demands
GitLab for source control, container registry, and CI/CD pipelines
Langfuse for agent observability, tracing, and evaluation
Milvus for vector storage and similarity search
MCP Gateway Registry for tool discovery and management

Core Components

Component	Purpose
GitLab	Source control, container registry, and CI/CD for agent code
Langfuse	LLM observability, tracing, prompt management, and evaluation
Milvus	Vector database for embeddings and similarity search
MCP Gateway Registry	Discovery and management of Model Context Protocol servers
Karpenter	Kubernetes node autoscaling
ArgoCD	GitOps continuous delivery

Prerequisites

Domain and Certificate Setup

GitLab requires a valid TLS certificate, which requires owning a domain. You can use a subdomain from an existing domain.

Create a Route53 Hosted Zone

Follow the AWS documentation to create a hosted zone. For a subdomain, name it following the pattern subdomain.domain.tld.
(Optional) Configure as Subdomain

If using a subdomain, add the hosted zone as a subdomain to your main domain.
Create an ACM Certificate

Follow the ACM documentation to create a certificate for your domain.

Tools Required

AWS CLI configured with appropriate permissions
Terraform >= 1.0
kubectl
Helm >= 3.0

Deployment

Step 1: Clone and Navigate

git clone https://github.com/awslabs/ai-on-eks.git
cd ai-on-eks/infra/solutions/agents-on-eks

Step 2: Configure Variables

Edit terraform/blueprint.tfvars to set your domain:

name                        = "aioeks-agents"
enable_langfuse             = true
enable_gitlab               = true
enable_external_dns         = true
enable_milvus               = true
enable_mcp_gateway_registry = true
max_user_namespaces         = 16384
acm_certificate_domain      = "agents.example.com"  # Update with your domain
allowed_inbound_cidrs       = "0.0.0.0/0"           # Restrict inbound IPs

Step 3: Deploy

./install.sh

Deployment takes approximately 20 minutes.

Step 4: Configure kubectl

After deployment, configure kubectl to access your cluster:

aws eks update-kubeconfig --name aioeks-agents --region us-west-2

Accessing Services

GitLab

GitLab will be available at https://gitlab.<your-domain>. Retrieve the root password:

kubectl get secret gitlab-gitlab-initial-root-password -n gitlab -o jsonpath='{.data.password}' | base64 -d

Langfuse

Access Langfuse through port-forwarding:

kubectl port-forward svc/langfuse 3000:3000 -n langfuse

Then open http://localhost:3000 in your browser.

Milvus

Connect to Milvus from within the cluster at milvus.milvus.svc.cluster.local:19530.

MCP Gateway Registry

The MCP Gateway Registry will be available at https://mcpregistry.<your-domain>.

Configuration Options

Variable	Description	Default
`name`	Cluster name	`aioeks-agents`
`region`	AWS region	`us-west-2`
`eks_cluster_version`	EKS version	`1.34`
`acm_certificate_domain`	Domain for TLS certificates	`""` (required)
`allowed_inbound_cidrs`	CIDR ranges allowed through load balancer	`0.0.0.0/0`
`enable_langfuse`	Deploy Langfuse	`true`
`enable_gitlab`	Deploy GitLab	`true`
`enable_milvus`	Deploy Milvus	`true`
`enable_mcp_gateway_registry`	Deploy MCP Gateway Registry	`true`
`enable_external_dns`	Enable External DNS for Route53	`true`

Restricting Inbound Access

The allowed_inbound_cidrs variable controls which IP ranges can access services through the load balancer. Restrict this to your organization's IP ranges:

allowed_inbound_cidrs = "10.0.0.0/8,192.168.1.0/24"

Ensure the CIDR includes your developer IPs and GitLab Runner node IPs for CI/CD pipelines.

Cleanup

To destroy the infrastructure:

cd terraform/_LOCAL
./cleanup.sh

Next Steps

Configure GitLab CI/CD pipelines for your agent code
Set up Langfuse projects and API keys for tracing
Create Milvus collections for your embedding storage
Register MCP servers in the gateway registry

Why?​

Use Cases​

Architecture​

Core Components​

Prerequisites​

Domain and Certificate Setup​

Tools Required​

Deployment​

Step 1: Clone and Navigate​

Step 2: Configure Variables​

Step 3: Deploy​

Step 4: Configure kubectl​

Accessing Services​

GitLab​

Langfuse​

Milvus​

MCP Gateway Registry​

Configuration Options​

Restricting Inbound Access​

Cleanup​

Next Steps​