StarRocks on EKS Infrastructure
Deploy a production-ready StarRocks cluster on Amazon EKS with Karpenter auto-scaling, EBS CSI driver, and optimized performance for analytical workloads.
Architecture
This stack deploys StarRocks on EKS with Karpenter for elastic node provisioning and EBS CSI driver for persistent storage. The architecture supports high-performance analytical queries with automatic scaling and fault tolerance.
Key Components:
- Frontend (FE) - Query planning, metadata management, and cluster coordination
- Backend (BE) - Data storage and query execution engines
- Karpenter - Provisions optimized EC2 instances based on workload demands
- EBS CSI Driver - Persistent storage with GP3 volumes for optimal I/O performance
- S3 Integration - External data lake connectivity for data ingestion
Prerequisites
Before deploying, ensure you have the following tools installed:
- AWS CLI - Install Guide
- Terraform (>= 1.5.0) - Install Guide
- kubectl - Install Guide
- Helm (>= 3.0) - Install Guide
- AWS credentials configured - Run
aws configureor use IAM roles
Step 1: Clone Repository & Navigate
git clone https://github.com/awslabs/data-on-eks.git
cd data-on-eks/data-stacks/starrocks-on-eks
Step 2: Customize Stack Configuration
Edit the stack configuration file to customize deployment:
vi terraform/data-stack.tfvars
Review and customize the configuration:
name = "starrocks-on-eks"
region = "us-east-1"
# Core StarRocks component
enable_starrocks = true
# Optional: Additional data platform components
enable_ingress_nginx = true # Ingress controller
enable_jupyterhub = true # Notebooks for data exploration
enable_amazon_prometheus = true # Monitoring
enable_superset = true # BI dashboards
enable_airflow = true # Workflow orchestration
enable_pinot = true # Real-time analytics
enable_datahub = true # Metadata management
enable_celeborn = true # Remote shuffle service
enable_raydata = true # Ray for data processing
enable_emr_on_eks = true # EMR on EKS
enable_emr_spark_operator = true # EMR Spark Operator
enable_nvidia_device_plugin = true # GPU workloads
For a StarRocks-only deployment, set all optional components to false:
enable_starrocks = true
enable_ingress_nginx = false
enable_jupyterhub = false
enable_amazon_prometheus = false
enable_superset = false
enable_airflow = false
enable_pinot = false
enable_datahub = false
enable_celeborn = false
enable_raydata = false
enable_emr_on_eks = false
enable_emr_spark_operator = false
enable_nvidia_device_plugin = false
This reduces deployment time and costs by ~60%.
What Gets Deployed
When you deploy the stack with StarRocks enabled, the following components are provisioned:
StarRocks Components
| Component | Purpose | Instance Type |
|---|---|---|
| StarRocks Operator | Manages StarRocks cluster lifecycle via CRD | Runs on core node group |
| Frontend (FE) | Query planning, metadata management, cluster coordination | General-purpose Graviton (m-family, on-demand) |
| Compute Node (CN) | Stateless query execution for shared-data architecture | Memory-optimized Graviton (r-family, on-demand) |
| S3 Data Bucket | Shared-data storage for StarRocks tables | N/A |
| Pod Identity Role | IAM access for CN nodes to read/write S3 | N/A |
CN Autoscaling Configuration
Compute Nodes automatically scale based on resource utilization:
Scaling Triggers:
- CPU Utilization - Scale up when average > 60%
- Memory Utilization - Scale up when average > 60%
- Min Replicas - 1
- Max Replicas - 10 (configurable)
Scale-down Policy: 1 pod per 60 seconds Scale-up Policy: 2 pods per 30 seconds
Storage Configuration
- FE Metadata - Stored in-cluster via StatefulSet PVCs (GP3 EBS)
- CN Cache - GP3 PVC (100Gi) for local query cache
- Table Data - S3 bucket (shared-data mode) with AES256 encryption
Step 3: Deploy Infrastructure
Run the deployment script:
./deploy.sh
Expected deployment time: 15-20 minutes
The deployment includes EKS cluster, all platform addons, and the StarRocks operator.
Step 4: Verify Deployment
The deployment script automatically configures kubectl. Verify the cluster and StarRocks operator:
# Set kubeconfig (done automatically by deploy.sh)
export KUBECONFIG=kubeconfig.yaml
# Verify StarRocks operator
kubectl get pods -n starrocks
Expected Output
NAME READY STATUS RESTARTS AGE
kube-starrocks-operator-5d558f7b8b-9s5m6 1/1 Running 0 66m
The EKS cluster is named starrocks-on-eks (from data-stack.tfvars).
To verify:
aws eks describe-cluster --name starrocks-on-eks --region us-east-1
Step 5: Deploy StarRocks Cluster
Update the S3 bucket and region in the manifest, then apply:
export STARROCKS_BUCKET=$(cd terraform/_local && terraform output -raw starrocks_s3_bucket_id)
export AWS_REGION=$(cd terraform/_local && terraform output -raw region)
sed -i '' "s|aws_s3_path = s3://<STARROCKS_S3_BUCKET_ID>|aws_s3_path = s3://${STARROCKS_BUCKET}|" examples/starrocks-shared-data.yaml
sed -i '' "s|aws_s3_region = <REGION>|aws_s3_region = ${AWS_REGION}|" examples/starrocks-shared-data.yaml
kubectl apply -f examples/starrocks-shared-data.yaml
Wait for pods to be ready (Karpenter provisions new nodes — takes 3-5 minutes):
kubectl get pods -n starrocks -w
Step 6: Verify StarRocks Deployment
# Check StarRocks pods
kubectl get pods -n starrocks
Expected Output
NAME READY STATUS RESTARTS AGE
kube-starrocks-operator-5d558f7b8b-9s5m6 1/1 Running 0 82m
starrocks-shared-data-cn-0 1/1 Running 0 8m
starrocks-shared-data-fe-0 1/1 Running 0 10m
starrocks-shared-data-fe-1 1/1 Running 0 10m
starrocks-shared-data-fe-2 1/1 Running 0 10m
Step 7: Connect to StarRocks
Connect using a MySQL client pod inside the cluster:
kubectl run mysql-client -n starrocks --rm -i --restart=Never --image=mysql:8.0 -- \
mysql -hstarrocks-shared-data-fe-service -P9030 -uroot -e "SHOW DATABASES;"
Expected Output
Database
_statistics_
information_schema
sys
Next Steps
With the infrastructure deployed, you can now:
- Run TPC-DS Benchmark - Validate query performance with 1TB dataset
- Load Your Data - Connect to external data sources via Stream Load
- Scale the Cluster - Adjust CN replicas based on workload
Troubleshooting
Pods stuck in Pending state:
kubectl describe pods -n starrocks
kubectl logs -n karpenter deployment/karpenter
StarRocks connection issues:
kubectl logs starrocks-shared-data-fe-0 -n starrocks
kubectl get endpoints -n starrocks
Cleanup
To remove all resources:
cd terraform/_local
./cleanup.sh
This will delete all resources including:
- EKS cluster and all workloads
- S3 buckets (StarRocks data)
- VPC and networking resources
Ensure you've backed up any important data before cleanup.