Spark with EBS Node-Level Storage
Learn to use shared EBS volumes per node for Spark shuffle storage - a cost-effective alternative to per-pod PVCs.
Prerequisites
- Deploy Spark on EKS infrastructure: Infrastructure Setup
- Existing EBS volume (200Gi GP3) used with dedicated Spark directory
This approach shares one EBS volume per node among all Spark pods. While cost-effective, it may have noisy neighbor issues when multiple workloads compete for the same storage.
Architecture: Shared EBS Volume per Node
Key Benefits:
- 💰 Cost Effective: One EBS volume per node vs per pod
- ⚡ Higher Performance: Direct node storage access
- 🔄 Simplified Management: No PVC lifecycle complexity
- ⚠️ Trade-off: Potential noisy neighbor issues
What is Shuffle Storage in Spark?
Shuffle storage holds intermediate data during Spark operations like groupBy, join, and reduceByKey. When data is redistributed across executors, it's temporarily stored before being read by subsequent stages.
Spark Shuffle Storage Options
| Storage Type | Performance | Cost | Use Case |
|---|---|---|---|
| NVMe SSD Instances | 🔥 Very High | 💰 High | Maximum performance workloads |
| EBS Node Storage | ⚡ High | 💵 Medium | Featured - Cost-effective shared storage |
| EBS Dynamic PVC | 📊 Medium | 💰 Medium | Per-pod isolation and fault tolerance |
| FSx for Lustre | 📊 Medium | 💵 Low | Parallel filesystem for HPC |
| S3 Express + Mountpoint | 📊 Medium | 💵 Low | Very large datasets |
| Remote Shuffle (Celeborn) | ⚡ High | 💰 Medium | Resource disaggregation |
Benefits: Performance & Cost
- EBS Node Storage: Balance of performance and cost with shared volumes
- Higher throughput: Direct access without Kubernetes volume overhead
- Cost optimization: Fewer EBS volumes than per-pod approach
Example Code
View the complete configuration:
📄 Complete EBS Node Storage Configuration
# Pre-requisite before running this job
# 1/ Open taxi-trip-execute.sh and update $S3_BUCKET and <REGION>
# 2/ Replace $S3_BUCKET with your S3 bucket created by this example (Check Terraform outputs)
# 3/ execute taxi-trip-execute.sh
# This example demonstrates EBS Node-Level Storage features
# Shared EBS volume per node (mounted at /mnt/spark-local)
# Cost-effective alternative to per-pod PVCs
# Higher performance than individual PVCs but potential noisy neighbor issues
---
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: "taxi-trip"
namespace: spark-team-a
labels:
app: "taxi-trip"
applicationId: "taxi-trip-ebs"
queue: root.test
spec:
# To create Ingress object for Spark driver.
# Ensure Spark Operator Helm Chart deployed with Ingress enabled to use this feature
# sparkUIOptions:
# servicePort: 4040
# servicePortName: taxi-trip-ui-svc
# serviceType: ClusterIP
# ingressAnnotations:
# kubernetes.io/ingress.class: nginx
# nginx.ingress.kubernetes.io/use-regex: "true"
type: Python
sparkVersion: "3.5.3"
mode: cluster
image: "public.ecr.aws/data-on-eks/spark:3.5.3-scala2.12-java17-python3-ubuntu"
imagePullPolicy: IfNotPresent
mainApplicationFile: "s3a://$S3_BUCKET/taxi-trip/scripts/pyspark-taxi-trip.py" # MainFile is the path to a bundled JAR, Python, or R file of the application
arguments:
- "s3a://$S3_BUCKET/taxi-trip/input/"
- "s3a://$S3_BUCKET/taxi-trip/output/"
sparkConf:
"spark.app.name": "taxi-trip"
"spark.kubernetes.driver.pod.name": "taxi-trip"
"spark.kubernetes.executor.podNamePrefix": "taxi-trip"
"spark.local.dir": "/data1"
"spark.speculation": "false"
"spark.network.timeout": "2400"
"spark.hadoop.fs.s3a.connection.timeout": "1200000"
"spark.hadoop.fs.s3a.path.style.access": "true"
"spark.hadoop.fs.s3a.connection.maximum": "200"
"spark.hadoop.fs.s3a.fast.upload": "true"
"spark.hadoop.fs.s3a.readahead.range": "256K"
"spark.hadoop.fs.s3a.input.fadvise": "random"
"spark.hadoop.fs.s3a.aws.credentials.provider.mapping": "com.amazonaws.auth.WebIdentityTokenCredentialsProvider=software.amazon.awssdk.auth.credentials.ContainerCredentialsProvider"
"spark.hadoop.fs.s3a.aws.credentials.provider": "software.amazon.awssdk.auth.credentials.ContainerCredentialsProvider" # AWS SDK V2 https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/aws_sdk_upgrade.html
"spark.hadoop.fs.s3.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem"
# Spark Event logs
"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "s3a://$S3_BUCKET/spark-event-logs"
"spark.eventLog.rolling.enabled": "true"
"spark.eventLog.rolling.maxFileSize": "64m"
# "spark.history.fs.eventLog.rolling.maxFilesToRetain": 100
# Expose Spark metrics for Prometheus
"spark.ui.prometheus.enabled": "true"
"spark.executor.processTreeMetrics.enabled": "true"
"spark.metrics.conf.*.sink.prometheusServlet.class": "org.apache.spark.metrics.sink.PrometheusServlet"
"spark.metrics.conf.driver.sink.prometheusServlet.path": "/metrics/driver/prometheus/"
"spark.metrics.conf.executor.sink.prometheusServlet.path": "/metrics/executors/prometheus/"
# EBS Node-Level Storage Configuration
# Use node-level EBS volume mounted at /mnt/spark-local
# This provides shared storage per node, reducing costs compared to per-pod PVCs
"spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.options.path": "/mnt/spark-local"
"spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.options.type": "Directory"
"spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.mount.path": "/data1"
"spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.mount.readOnly": "false"
"spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.options.path": "/mnt/spark-local"
"spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.options.type": "Directory"
"spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.mount.path": "/data1"
"spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.mount.readOnly": "false"
restartPolicy:
type: OnFailure
onFailureRetries: 3
onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20
driver:
cores: 1
coreLimit: "1200m"
memory: "4g"
memoryOverhead: "4g"
serviceAccount: spark-team-a
labels:
version: 3.5.3
nodeSelector:
node.kubernetes.io/workload-type: "compute-optimized-x86"
karpenter.sh/capacity-type: "on-demand"
tolerations:
- key: "spark-compute-optimized"
operator: "Exists"
effect: "NoSchedule"
executor:
cores: 1
coreLimit: "1200m"
instances: 2
memory: "4g"
memoryOverhead: "4g"
serviceAccount: spark-team-a
labels:
version: 3.5.3
nodeSelector:
node.kubernetes.io/workload-type: "compute-optimized-x86"
tolerations:
- key: "spark-compute-optimized"
operator: "Exists"
effect: "NoSchedule"
EBS Node Storage Configuration
Key configuration for shared node-level storage:
sparkConf:
# Node-level EBS volume - Driver
"spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.options.path": "/mnt/spark-local"
"spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.options.type": "Directory"
"spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.mount.path": "/data1"
"spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.mount.readOnly": "false"
# Node-level EBS volume - Executor
"spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.options.path": "/mnt/spark-local"
"spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.options.type": "Directory"
"spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.mount.path": "/data1"
"spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.mount.readOnly": "false"
Features:
hostPath: Uses node-level directory on root volume/mnt/spark-local: Shared directory per node (200Gi GP3 root volume)Directory: Ensures directory exists on the node- All pods on same node share the storage
Create Test Data and Run Example
Process NYC taxi data to demonstrate EBS node-level storage with shared volumes.
1. Prepare Test Data
cd data-stacks/spark-on-eks/terraform/_local/
# Export S3 bucket and region from Terraform outputs
export S3_BUCKET=$(terraform output -raw s3_bucket_id_spark_history_server)
export REGION=$(terraform output -raw region)
# Navigate to scripts directory and create test data
cd ../../scripts/
./taxi-trip-execute.sh $S3_BUCKET $REGION
Downloads NYC taxi data (1.1GB total) and uploads to S3
2. Execute Spark Job
# Navigate to example directory
cd ../examples/
# Submit the EBS Node Storage job
envsubst < ebs-node-storage.yaml | kubectl apply -f -
# Monitor job progress
kubectl get sparkapplications -n spark-team-a --watch
Expected output:
NAME STATUS ATTEMPTS START FINISH AGE
taxi-trip COMPLETED 1 2025-09-28T17:03:31Z 2025-09-28T17:08:15Z 4m44s
Verify Data and Storage
Monitor Node Storage Usage
# Check which nodes have Spark pods
kubectl get pods -n spark-team-a -o wide
# SSH to nodes and check storage usage (if needed)
kubectl debug node/<node-name> -it --image=busybox -- df -h /host/
# Check directory structure on nodes
kubectl debug node/<node-name> -it --image=busybox -- ls -la /host/mnt/spark-local/
Check Pod Status and Storage
# Check driver and executor pods
kubectl get pods -n spark-team-a -l app=taxi-trip
# Verify node storage is mounted correctly (/dev/nvme0n1p1 at /data1)
kubectl exec -n spark-team-a taxi-trip-exec-1 -- df -h
# Expected output:
# /dev/nvme0n1p1 200G 7.9G 193G 4% /data1
# Check Spark shuffle data directories (multiple blockmgr per node)
kubectl exec -n spark-team-a taxi-trip-exec-1 -- ls -la /data1/
# Expected output shows shared storage with multiple block managers:
# drwxr-xr-x. 22 spark spark 16384 Sep 28 22:09 blockmgr-7c0ac908-26a3-4395-8a8f-2221b4d5d7c3
# drwxr-xr-x. 13 spark spark 116 Sep 28 22:09 blockmgr-9ed9c2fd-53e1-4337-8a68-9a48e1e63d5f
# drwxr-xr-x. 13 spark spark 116 Sep 28 22:09 blockmgr-ecc9fa35-a82e-4486-85fe-c8ef963d6eb7
# Verify shared storage between pods on same node
kubectl exec -n spark-team-a taxi-trip-driver -- ls -la /data1/
# View Spark application logs
kubectl logs -n spark-team-a -l spark-role=driver --follow
Verify Output Data
# Check processed output in S3
aws s3 ls s3://$S3_BUCKET/taxi-trip/output/
# Verify event logs
aws s3 ls s3://$S3_BUCKET/spark-event-logs/
Node Storage Considerations
Advantages
- Cost Savings: ~70% cost reduction vs per-pod PVCs
- Higher Performance: Direct node storage access
- Simplified Operations: No PVC management overhead
- Better Resource Utilization: Shared storage pool per node
Disadvantages & Mitigation
- Noisy Neighbors: Multiple pods compete for same storage I/O
- Mitigation: Use compute-optimized instances with higher IOPS
- No Isolation: Pods can see each other's temporary data
- Mitigation: Configure proper directory permissions
- Storage Sizing: Must pre-size for all workloads on node
- Mitigation: Monitor usage and adjust volume size
When to Use Node Storage
✅ Good for:
- Cost-sensitive workloads
- Predictable I/O patterns
- Trusted multi-tenant environments
- Development/testing environments
❌ Avoid for:
- Security-sensitive workloads requiring isolation
- Unpredictable I/O bursts
- Critical production workloads needing guarantees
Storage Class Options
# GP3 - Better price/performance (default)
volumeType: gp3
# IO1 - High IOPS workloads
volumeType: io1
# ST1 - Throughput-optimized
volumeType: st1
Cleanup
# Delete the Spark application
kubectl delete sparkapplication taxi-trip -n spark-team-a
# Node storage persists until node termination
# Data is automatically cleaned up when nodes are replaced
Next Steps
- EBS Dynamic PVC Storage - Per-pod storage isolation
- NVMe Instance Storage - High-performance local SSD
- Infrastructure Setup - Deploy base infrastructure