Customizing Your Stack
Customizing Your Stack
After deploying a data stack, you can customize it by enabling additional components or adding new ones.
If you haven't deployed a stack yet, start with one of the pre-configured data stacks using default settings. Once running, return here to customize.
- Core Infrastructure: Always deployed - provides foundational capabilities like autoscaling, monitoring, and GitOps
- Optional Components: Enable on-demand via Terraform variables
- Custom Components: Add your own via ArgoCD for experimentation
Core Configuration Variables
These variables configure the basic infrastructure settings:
| Variable | Description | Terraform Variable | Type | Default |
|---|---|---|---|---|
| Name | Name to be used on all the resources as identifier | name | string | data-on-eks |
| Region | AWS region | region | string | us-west-2 |
| Tags | A map of tags to add to all resources | tags | string | {} |
Core Infrastructure (Always Enabled)
These components are deployed by default in all data stacks:
| Component | Description | Variable | Status |
|---|---|---|---|
| AWS Load Balancer Controller | AWS ALB/NLB integration | N/A | ✅ |
| Argo Events | Event-driven workflow automation | N/A | ✅ |
| Argo Workflows | Workflow engine for Kubernetes | N/A | ✅ |
| ArgoCD | GitOps continuous delivery | N/A | ✅ |
| Cert Manager | TLS certificate management | N/A | ✅ |
| Fluent Bit | Log forwarding to CloudWatch | N/A | ✅ |
| Karpenter | Node autoscaling and provisioning | N/A | ✅ |
| Kube Prometheus Stack | Prometheus and Grafana monitoring | N/A | ✅ |
| Spark History Server | Spark job history and metrics | N/A | ✅ |
| Spark Operator | Apache Spark job orchestration | N/A | ✅ |
| YuniKorn | Advanced batch scheduling | N/A | ✅ |
Optional Components
These components can be enabled by setting their corresponding Terraform variable.
How to Enable
- Edit your stack's
terraform/data-stack.tfvarsfile - Set the corresponding
enable_*variable totrue - Redeploy:
./deploy.sh
Example:
name = "my-data-platform"
region = "us-east-1"
enable_datahub = true
enable_superset = true
Available Optional Components
| Component | Description | Variable | Default |
|---|---|---|---|
| Airflow | Enable Apache Airflow for workflow orchestration | enable_airflow | ❌ |
| Amazon Prometheus | Enable AWS Managed Prometheus service | enable_amazon_prometheus | ❌ |
| Celeborn | Enable Apache Celeborn for remote shuffling service | enable_celeborn | ❌ |
| Cluster Addons | A map of EKS addon names to boolean values that control whether each addon is enabled. This allows fine-grained control over which addons are deployed by this Terraform stack. To enable or disable an addon, set its value to true or false in your blueprint.tfvars file. If you need to add a new addon, update this variable definition and also adjust the logic in the EKS module (e.g., in eks.tf locals) to include any custom configuration needed. | enable_cluster_addons | ❌ |
| Datahub | Enable DataHub for metadata management | enable_datahub | ❌ |
| Ingress Nginx | Enable ingress-nginx | enable_ingress_nginx | ✅ |
| Ipv6 | Enable IPv6 for the EKS cluster and its components | enable_ipv6 | ❌ |
| Jupyterhub | Enable Jupyter Hub | enable_jupyterhub | ✅ |
| Nvidia Device Plugin | Enable NVIDIA Device plugin addon for GPU workloads | enable_nvidia_device_plugin | ❌ |
| Raydata | Enable Ray Data via ArgoCD | enable_raydata | ❌ |
| Superset | Enable Apache Superset for data exploration and visualization | enable_superset | ❌ |
EKS Cluster Addons
Fine-grained control over EKS addons via the enable_cluster_addons map variable:
| Addon | Description | Variable | Default |
|---|---|---|---|
| Amazon Cloudwatch Observability | Amazon CloudWatch observability | enable_cluster_addons["amazon-cloudwatch-observability"] | ❌ |
| Aws Ebs Csi Driver | Amazon EBS CSI driver for persistent volumes | enable_cluster_addons["aws-ebs-csi-driver"] | ✅ |
| Aws Mountpoint S3 Csi Driver | Mountpoint for Amazon S3 CSI driver | enable_cluster_addons["aws-mountpoint-s3-csi-driver"] | ✅ |
| Eks Node Monitoring Agent | EKS node monitoring agent | enable_cluster_addons["eks-node-monitoring-agent"] | ✅ |
| Metrics Server | Kubernetes metrics server for autoscaling | enable_cluster_addons["metrics-server"] | ✅ |
Adding New Components
After deploying a data stack, you can add additional components for experimentation.
Quick Method: Deploy via ArgoCD
The fastest way to add a new component:
- Create an ArgoCD Application manifest
- Apply it:
kubectl apply -f my-component.yaml - Monitor:
kubectl get application my-component -n argocd
Example: Adding Kro (Kubernetes Resource Orchestrator)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: kro
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.k8s.io/kro/charts
chart: kro
targetRevision: 0.7.1
destination:
server: https://kubernetes.default.svc
namespace: kro-system
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Apply the manifest:
kubectl apply -f kro-app.yaml
Verify deployment:
kubectl get application kro -n argocd
kubectl get pods -n kro-system
Clean up when done:
kubectl delete application kro -n argocd
Advanced: Integrate into Stack (Optional)
If you want the component managed by Terraform for reproducible deployments:
-
Create overlay files in your stack's
terraform/directory:terraform/kro.tf- Terraform resource definitionsterraform/argocd-applications/kro.yaml- ArgoCD app manifestterraform/helm-values/kro.yaml- Helm values (if needed)
-
Redeploy your stack:
./deploy.sh
This approach is useful if you're building a reusable stack for your team. See the Contributing Guide for detailed instructions.
Note: This page is auto-generated from Terraform source code. To update, run:
./website/scripts/generate-available-components.sh