Flink Infrastructure Deployment

Flink on EKS Infrastructure Deployment

Complete guide for deploying and configuring the Flink on EKS infrastructure for streaming workloads.

Prerequisites

Before deploying, ensure you have the following tools installed:

AWS CLI - Install Guide
Terraform (>= 1.0) - Install Guide
kubectl - Install Guide
Helm (>= 3.0) - Install Guide
AWS credentials configured - Run aws configure or use IAM roles

Overview

The Flink on EKS infrastructure provides a production-ready foundation for Apache Flink streaming workloads on Amazon EKS. It includes:

EKS Cluster with streaming-optimized configurations
Flink Operator for native Kubernetes Flink job management
Kafka Cluster for event streaming and data ingestion
State Backend with S3 storage for fault tolerance
Monitoring Stack with Flink-specific metrics and dashboards

Quick Start

1. Clone the Repository

# Clone the repository
git clone https://github.com/awslabs/data-on-eks.git
cd data-on-eks/data-stacks/emr-on-eks

2. Review Configuration

Edit terraform/data-stack.tfvars to customize your deployment:

# EMR on EKS Data Stack Configuration
# This file enables EMR on EKS Virtual Clusters for running Spark jobs

name          = "emr-on-eks"
region        = "us-west-2"
deployment_id = "your-unique-id"

# Enable EMR on EKS Virtual Clusters
enable_emr_on_eks = true

# Enable EMR Spark Operator for declarative Spark job management
enable_emr_spark_operator = true

# Enable EMR Flink Kubernetes Operator, replacing the opensource
enable_emr_flink_operator = true

# Optional: Enable additional addons as needed
enable_ingress_nginx = true
enable_ipv6          = false

3. Deploy Infrastructure

./deploy.sh

This script will:

Initialize Terraform
Create VPC and networking (if not exists)
Deploy EKS cluster with managed node groups
Install Karpenter for autoscaling
Install YuniKorn scheduler
Create EMR virtual clusters for Team A and Team B
Configure IAM roles and pod identity associations
Set up S3 buckets for logs and data
Deploy EMR Flink Kubernetes Operator

Deployment Time

Initial deployment takes approximately 30-40 minutes. Subsequent updates are faster.

4. View Terraform Outputs

After deployment completes, view the infrastructure details:

cd terraform/_local
terraform output

You should see output similar to:

cluster_arn = "arn:aws:eks:us-west-2:123456789:cluster/emr-on-eks"
cluster_name = "emr-on-eks"
configure_kubectl = "aws eks --region us-west-2 update-kubeconfig --name emr-on-eks"
deployment_id = "abcdefg"
emr_on_eks = {
  "cloudwatch_log_groups" = {
    "emr-data-team-a" = {
      "arn" = "arn:aws:logs:us-west-2:301444719761:log-group:/emr-on-eks-logs/emr-on-eks/emr-data-team-a"
      "name" = "/emr-on-eks-logs/emr-on-eks/emr-data-team-a"
    }
    "emr-data-team-b" = {
      "arn" = "arn:aws:logs:us-west-2:301444719761:log-group:/emr-on-eks-logs/emr-on-eks/emr-data-team-b"
      "name" = "/emr-on-eks-logs/emr-on-eks/emr-data-team-b"
    }
  }
  "job_execution_role_arns" = {
    "emr-data-team-a" = "arn:aws:iam::301444719761:role/emr-on-eks-emr-data-team-a"
    "emr-data-team-b" = "arn:aws:iam::301444719761:role/emr-on-eks-emr-data-team-b"
  }
  "namespaces" = {
    "emr-data-team-a" = "emr-data-team-a"
    "emr-data-team-b" = "emr-data-team-b"
  }
  "virtual_clusters" = {
    "emr-data-team-a" = {
      "arn" = "arn:aws:emr-containers:us-west-2:301444719761:/virtualclusters/rthjrl76dgz7x1xixlf11lbc0"
      "id" = "rthjrl76dgz7x1xixlf11lbc0"
      "name" = "emr-on-eks-emr-data-team-a"
      "namespace" = "emr-data-team-a"
    }
    "emr-data-team-b" = {
      "arn" = "arn:aws:emr-containers:us-west-2:301444719761:/virtualclusters/agvpvoyl5poe1to9mwjizrbsk"
      "id" = "agvpvoyl5poe1to9mwjizrbsk"
      "name" = "emr-on-eks-emr-data-team-b"
      "namespace" = "emr-data-team-b"
    }
  }
}
emr_s3_bucket_name = "emr-on-eks-spark-logs-123456789"
region = "us-west-2"

note

If deployment fails:

Rerun the same command: ./deploy.sh
If it still fails, debug using kubectl commands or raise an issue

Post-Deployment Verification

The deployment script automatically configures kubectl. Verify the cluster is ready:

source set-env.sh

# Set kubeconfig
export KUBECONFIG=kubeconfig.yaml

# Verify cluster nodes
kubectl get nodes

# Check all namespaces
kubectl get namespaces

# Verify ArgoCD applications
kubectl get applications -n argocd

Quick Verification

Run these commands to verify successful deployment:

# 1. Check nodes are ready
kubectl get nodes
# Expected: 4-5 nodes with STATUS=Ready

# 2. Check ArgoCD applications are synced
kubectl get applications -n argocd
# Expected: All apps showing "Synced" and "Healthy"

# 3. Check Karpenter NodePools ready
kubectl get nodepools

Cleanup

Infrastructure Cleanup

# Complete cleanup
./cleanup.sh

Next Steps

After deploying the infrastructure:

Getting Started with basic Flink Jobs

Flink on EKS Infrastructure Deployment​

Prerequisites​

Overview​

Quick Start​

1. Clone the Repository​

2. Review Configuration​

3. Deploy Infrastructure​

4. View Terraform Outputs​

Post-Deployment Verification​

Cleanup​

Infrastructure Cleanup​

Next Steps​