Skip to main content

Flink Autoscaler

Prerequisites

Autoscaler Example

Set the env variables for the job execution role and s3 bucket name :

source set-env.sh

Navigate to example directory and submit the Flink job.

cd data-on-eks/data-stacks/emr-on-eks/examples/flink/karpenter

Modify the autoscaler-example.yaml by replacing the placeholders with values from the two env variables above.

envsubst < autoscaler-example.yaml > autoscaler-example1.yaml

The job example contains Autoscaler configuration:

...
flinkConfiguration:

taskmanager.numberOfTaskSlots: "4"
# Autotuning parameters
kubernetes.operator.job.autoscaler.enabled: "true"
kubernetes.operator.job.autoscaler.stabilization.interval: 1m
kubernetes.operator.job.autoscaler.metrics.window: 1m
kubernetes.operator.job.autoscaler.target.utilization: "0.5"
kubernetes.operator.job.autoscaler.target.utilization.boundary: "0.2"
kubernetes.operator.job.autoscaler.restart.time: 1m
kubernetes.operator.job.autoscaler.catch-up.duration: 5m
kubernetes.operator.job.autoscaler.vertex.exclude.ids: ""

...

It starts a LoadSimulationPipeline job that simulates flucating loads from zero to the max (1,2,4, and 8) loads:

job:
# if you have your job jar in S3 bucket you can use that path as well
jarURI: local:///opt/flink/examples/streaming/AutoscalingExample.jar
entryClass: org.apache.flink.streaming.examples.autoscaling.LoadSimulationPipeline
args:
- "--maxLoadPerTask"
- "1;2;4;8;16;"
- "--repeatsAfterMinutes"
- "60"

Deploy the job with the kubectl deploy command.

kubectl apply -f autoscaler-example1.yaml

Monitor the job status using the below command. You should see the new nodes triggered by the karpenter and the YuniKorn will schedule Job manager pods and two Taskmanager pods. As the load increases, autoscaler changes the parallelism of the tasks and more task manager pods are added as needed:

NAME                                             READY   STATUS    RESTARTS   AGE
autoscaler-example-5cbd4c9864-bt2qj 2/2 Running 0 81m
autoscaler-example-5cbd4c9864-r4gbg 2/2 Running 0 81m
autoscaler-example-taskmanager-1-1 2/2 Running 0 80m
autoscaler-example-taskmanager-1-2 2/2 Running 0 80m
autoscaler-example-taskmanager-1-4 2/2 Running 0 78m
autoscaler-example-taskmanager-1-5 2/2 Running 0 78m
autoscaler-example-taskmanager-1-6 2/2 Running 0 38m
autoscaler-example-taskmanager-1-7 2/2 Running 0 38m
autoscaler-example-taskmanager-1-8 2/2 Running 0 38m

Access the Flink WebUI for the job by running this command:

kubectl port-forward svc/autoscaler-example-rest 8081 -n emr-data-team-a

Then browse to http://localhost:8081:

Flink Job UI

Examine the max_load and parallelism of each task.