Skip to main content

Customizing Your Stack

Customizing Your Stack

After deploying a data stack, you can customize it by enabling additional components or adding new ones.

Start with Defaults First

If you haven't deployed a stack yet, start with one of the pre-configured data stacks using default settings. Once running, return here to customize.

About Components
  • Core Infrastructure: Always deployed - provides foundational capabilities like autoscaling, monitoring, and GitOps
  • Optional Components: Enable on-demand via Terraform variables
  • Custom Components: Add your own via ArgoCD for experimentation

Core Configuration Variables

These variables configure the basic infrastructure settings:

VariableDescriptionTerraform VariableTypeDefault
NameName to be used on all the resources as identifiernamestringdata-on-eks
RegionAWS regionregionstringus-west-2
TagsA map of tags to add to all resourcestagsstring{}

Core Infrastructure (Always Enabled)

These components are deployed by default in all data stacks:

ComponentDescriptionVariableStatus
AWS Load Balancer ControllerAWS ALB/NLB integrationN/A
Argo EventsEvent-driven workflow automationN/A
Argo WorkflowsWorkflow engine for KubernetesN/A
ArgoCDGitOps continuous deliveryN/A
Cert ManagerTLS certificate managementN/A
Fluent BitLog forwarding to CloudWatchN/A
KarpenterNode autoscaling and provisioningN/A
Kube Prometheus StackPrometheus and Grafana monitoringN/A
Spark History ServerSpark job history and metricsN/A
Spark OperatorApache Spark job orchestrationN/A
YuniKornAdvanced batch schedulingN/A

Optional Components

These components can be enabled by setting their corresponding Terraform variable.

How to Enable

  1. Edit your stack's terraform/data-stack.tfvars file
  2. Set the corresponding enable_* variable to true
  3. Redeploy: ./deploy.sh

Example:

name   = "my-data-platform"
region = "us-east-1"

enable_datahub = true
enable_superset = true

Available Optional Components

ComponentDescriptionVariableDefault
AirflowEnable Apache Airflow for workflow orchestrationenable_airflow
Amazon PrometheusEnable AWS Managed Prometheus serviceenable_amazon_prometheus
CelebornEnable Apache Celeborn for remote shuffling serviceenable_celeborn
Cluster AddonsA map of EKS addon names to boolean values that control whether each addon is enabled. This allows fine-grained control over which addons are deployed by this Terraform stack. To enable or disable an addon, set its value to true or false in your blueprint.tfvars file. If you need to add a new addon, update this variable definition and also adjust the logic in the EKS module (e.g., in eks.tf locals) to include any custom configuration needed.enable_cluster_addons
DatahubEnable DataHub for metadata managementenable_datahub
Ingress NginxEnable ingress-nginxenable_ingress_nginx
Ipv6Enable IPv6 for the EKS cluster and its componentsenable_ipv6
JupyterhubEnable Jupyter Hubenable_jupyterhub
Nvidia Device PluginEnable NVIDIA Device plugin addon for GPU workloadsenable_nvidia_device_plugin
RaydataEnable Ray Data via ArgoCDenable_raydata
SupersetEnable Apache Superset for data exploration and visualizationenable_superset

EKS Cluster Addons

Fine-grained control over EKS addons via the enable_cluster_addons map variable:

AddonDescriptionVariableDefault
Amazon Cloudwatch ObservabilityAmazon CloudWatch observabilityenable_cluster_addons["amazon-cloudwatch-observability"]
Aws Ebs Csi DriverAmazon EBS CSI driver for persistent volumesenable_cluster_addons["aws-ebs-csi-driver"]
Aws Mountpoint S3 Csi DriverMountpoint for Amazon S3 CSI driverenable_cluster_addons["aws-mountpoint-s3-csi-driver"]
Eks Node Monitoring AgentEKS node monitoring agentenable_cluster_addons["eks-node-monitoring-agent"]
Metrics ServerKubernetes metrics server for autoscalingenable_cluster_addons["metrics-server"]

Adding New Components

After deploying a data stack, you can add additional components for experimentation.

Quick Method: Deploy via ArgoCD

The fastest way to add a new component:

  1. Create an ArgoCD Application manifest
  2. Apply it: kubectl apply -f my-component.yaml
  3. Monitor: kubectl get application my-component -n argocd

Example: Adding Kro (Kubernetes Resource Orchestrator)

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: kro
namespace: argocd
spec:
project: default
source:
repoURL: oci://registry.k8s.io/kro/charts
chart: kro
targetRevision: 0.7.1
destination:
server: https://kubernetes.default.svc
namespace: kro-system
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

Apply the manifest:

kubectl apply -f kro-app.yaml

Verify deployment:

kubectl get application kro -n argocd
kubectl get pods -n kro-system

Clean up when done:

kubectl delete application kro -n argocd

Advanced: Integrate into Stack (Optional)

If you want the component managed by Terraform for reproducible deployments:

  1. Create overlay files in your stack's terraform/ directory:

    • terraform/kro.tf - Terraform resource definitions
    • terraform/argocd-applications/kro.yaml - ArgoCD app manifest
    • terraform/helm-values/kro.yaml - Helm values (if needed)
  2. Redeploy your stack: ./deploy.sh

This approach is useful if you're building a reusable stack for your team. See the Contributing Guide for detailed instructions.


Note: This page is auto-generated from Terraform source code. To update, run:

./website/scripts/generate-available-components.sh