Spark with EBS Node-Level Storage

Learn to use shared EBS volumes per node for Spark shuffle storage - a cost-effective alternative to per-pod PVCs.

Prerequisites

Deploy Spark on EKS infrastructure: Infrastructure Setup
Existing EBS volume (200Gi GP3) used with dedicated Spark directory

Node Storage Considerations

This approach shares one EBS volume per node among all Spark pods. While cost-effective, it may have noisy neighbor issues when multiple workloads compete for the same storage.

Architecture: Shared EBS Volume per Node

Key Benefits:

💰 Cost Effective: One EBS volume per node vs per pod
⚡ Higher Performance: Direct node storage access
🔄 Simplified Management: No PVC lifecycle complexity
⚠️ Trade-off: Potential noisy neighbor issues

What is Shuffle Storage in Spark?

Shuffle storage holds intermediate data during Spark operations like groupBy, join, and reduceByKey. When data is redistributed across executors, it's temporarily stored before being read by subsequent stages.

Spark Shuffle Storage Options

Storage Type	Performance	Cost	Use Case
NVMe SSD Instances	🔥 Very High	💰 High	Maximum performance workloads
EBS Node Storage	⚡ High	💵 Medium	Featured - Cost-effective shared storage
EBS Dynamic PVC	📊 Medium	💰 Medium	Per-pod isolation and fault tolerance
FSx for Lustre	📊 Medium	💵 Low	Parallel filesystem for HPC
S3 Express + Mountpoint	📊 Medium	💵 Low	Very large datasets
Remote Shuffle (Celeborn)	⚡ High	💰 Medium	Resource disaggregation

Benefits: Performance & Cost

EBS Node Storage: Balance of performance and cost with shared volumes
Higher throughput: Direct access without Kubernetes volume overhead
Cost optimization: Fewer EBS volumes than per-pod approach

Example Code

View the complete configuration:

📄 Complete EBS Node Storage Configuration

examples/ebs-node-storage.yaml
# Pre-requisite before running this job
# 1/ Open taxi-trip-execute.sh and update $S3_BUCKET and <REGION>
# 2/ Replace $S3_BUCKET with your S3 bucket created by this example (Check Terraform outputs)
# 3/ execute taxi-trip-execute.sh

# This example demonstrates EBS Node-Level Storage features
  # Shared EBS volume per node (mounted at /mnt/spark-local)
  # Cost-effective alternative to per-pod PVCs
  # Higher performance than individual PVCs but potential noisy neighbor issues

---
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: "taxi-trip"
  namespace: spark-team-a
  labels:
    app: "taxi-trip"
    applicationId: "taxi-trip-ebs"
    queue: root.test
spec:
#  To create Ingress object for Spark driver.
#  Ensure Spark Operator Helm Chart deployed with Ingress enabled to use this feature
#  sparkUIOptions:
#    servicePort: 4040
#    servicePortName: taxi-trip-ui-svc
#    serviceType: ClusterIP
#    ingressAnnotations:
#      kubernetes.io/ingress.class: nginx
#      nginx.ingress.kubernetes.io/use-regex: "true"
  type: Python
  sparkVersion: "3.5.3"
  mode: cluster
  image: "public.ecr.aws/data-on-eks/spark:3.5.3-scala2.12-java17-python3-ubuntu"
  imagePullPolicy: IfNotPresent
  mainApplicationFile: "s3a://$S3_BUCKET/taxi-trip/scripts/pyspark-taxi-trip.py"  # MainFile is the path to a bundled JAR, Python, or R file of the application
  arguments:
    - "s3a://$S3_BUCKET/taxi-trip/input/"
    - "s3a://$S3_BUCKET/taxi-trip/output/"
  sparkConf:
    "spark.app.name": "taxi-trip"
    "spark.kubernetes.driver.pod.name": "taxi-trip"
    "spark.kubernetes.executor.podNamePrefix": "taxi-trip"
    "spark.local.dir": "/data1"
    "spark.speculation": "false"
    "spark.network.timeout": "2400"
    "spark.hadoop.fs.s3a.connection.timeout": "1200000"
    "spark.hadoop.fs.s3a.path.style.access": "true"
    "spark.hadoop.fs.s3a.connection.maximum": "200"
    "spark.hadoop.fs.s3a.fast.upload": "true"
    "spark.hadoop.fs.s3a.readahead.range": "256K"
    "spark.hadoop.fs.s3a.input.fadvise": "random"
    "spark.hadoop.fs.s3a.aws.credentials.provider.mapping": "com.amazonaws.auth.WebIdentityTokenCredentialsProvider=software.amazon.awssdk.auth.credentials.ContainerCredentialsProvider"
    "spark.hadoop.fs.s3a.aws.credentials.provider": "software.amazon.awssdk.auth.credentials.ContainerCredentialsProvider"  # AWS SDK V2 https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/aws_sdk_upgrade.html
    "spark.hadoop.fs.s3.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem"

    # Spark Event logs
    "spark.eventLog.enabled": "true"
    "spark.eventLog.dir": "s3a://$S3_BUCKET/spark-event-logs"
    "spark.eventLog.rolling.enabled": "true"
    "spark.eventLog.rolling.maxFileSize": "64m"
    # "spark.history.fs.eventLog.rolling.maxFilesToRetain": 100

    # Expose Spark metrics for Prometheus
    "spark.ui.prometheus.enabled": "true"
    "spark.executor.processTreeMetrics.enabled": "true"
    "spark.metrics.conf.*.sink.prometheusServlet.class": "org.apache.spark.metrics.sink.PrometheusServlet"
    "spark.metrics.conf.driver.sink.prometheusServlet.path": "/metrics/driver/prometheus/"
    "spark.metrics.conf.executor.sink.prometheusServlet.path": "/metrics/executors/prometheus/"

    # EBS Node-Level Storage Configuration
    # Use node-level EBS volume mounted at /mnt/spark-local
    # This provides shared storage per node, reducing costs compared to per-pod PVCs
    "spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.options.path": "/mnt/spark-local"
    "spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.options.type": "Directory"
    "spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.mount.path": "/data1"
    "spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.mount.readOnly": "false"

    "spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.options.path": "/mnt/spark-local"
    "spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.options.type": "Directory"
    "spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.mount.path": "/data1"
    "spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.mount.readOnly": "false"

  restartPolicy:
    type: OnFailure
    onFailureRetries: 3
    onFailureRetryInterval: 10
    onSubmissionFailureRetries: 5
    onSubmissionFailureRetryInterval: 20

  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "4g"
    memoryOverhead: "4g"
    serviceAccount: spark-team-a
    labels:
      version: 3.5.3
    nodeSelector:
      node.kubernetes.io/workload-type: "compute-optimized-x86"
      karpenter.sh/capacity-type: "on-demand"
    tolerations:
      - key: "spark-compute-optimized"
        operator: "Exists"
        effect: "NoSchedule"
  executor:
    cores: 1
    coreLimit: "1200m"
    instances: 2
    memory: "4g"
    memoryOverhead: "4g"
    serviceAccount: spark-team-a
    labels:
      version: 3.5.3
    nodeSelector:
      node.kubernetes.io/workload-type: "compute-optimized-x86"
    tolerations:
      - key: "spark-compute-optimized"
        operator: "Exists"
        effect: "NoSchedule"

EBS Node Storage Configuration

Key configuration for shared node-level storage:

Essential Node Storage Settings
sparkConf:
  # Node-level EBS volume - Driver
  "spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.options.path": "/mnt/spark-local"
  "spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.options.type": "Directory"
  "spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.mount.path": "/data1"
  "spark.kubernetes.driver.volumes.hostPath.spark-local-dir-1.mount.readOnly": "false"

  # Node-level EBS volume - Executor
  "spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.options.path": "/mnt/spark-local"
  "spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.options.type": "Directory"
  "spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.mount.path": "/data1"
  "spark.kubernetes.executor.volumes.hostPath.spark-local-dir-1.mount.readOnly": "false"

Features:

hostPath: Uses node-level directory on root volume
/mnt/spark-local: Shared directory per node (200Gi GP3 root volume)
Directory: Ensures directory exists on the node
All pods on same node share the storage

Create Test Data and Run Example

Process NYC taxi data to demonstrate EBS node-level storage with shared volumes.

1. Prepare Test Data

cd data-stacks/spark-on-eks/terraform/_local/

# Export S3 bucket and region from Terraform outputs
export S3_BUCKET=$(terraform output -raw s3_bucket_id_spark_history_server)
export REGION=$(terraform output -raw region)

# Navigate to scripts directory and create test data
cd ../../scripts/
./taxi-trip-execute.sh $S3_BUCKET $REGION

Downloads NYC taxi data (1.1GB total) and uploads to S3

2. Execute Spark Job

# Navigate to example directory
cd ../examples/

# Submit the EBS Node Storage job
envsubst < ebs-node-storage.yaml | kubectl apply -f -

# Monitor job progress
kubectl get sparkapplications -n spark-team-a --watch

Expected output:

NAME       STATUS    ATTEMPTS   START                  FINISH                 AGE
taxi-trip  COMPLETED 1          2025-09-28T17:03:31Z   2025-09-28T17:08:15Z   4m44s

Verify Data and Storage

Monitor Node Storage Usage

# Check which nodes have Spark pods
kubectl get pods -n spark-team-a -o wide

# SSH to nodes and check storage usage (if needed)
kubectl debug node/<node-name> -it --image=busybox -- df -h /host/

# Check directory structure on nodes
kubectl debug node/<node-name> -it --image=busybox -- ls -la /host/mnt/spark-local/

Check Pod Status and Storage

# Check driver and executor pods
kubectl get pods -n spark-team-a -l app=taxi-trip

# Verify node storage is mounted correctly (/dev/nvme0n1p1 at /data1)
kubectl exec -n spark-team-a taxi-trip-exec-1 -- df -h

# Expected output:
# /dev/nvme0n1p1  200G  7.9G  193G   4% /data1

# Check Spark shuffle data directories (multiple blockmgr per node)
kubectl exec -n spark-team-a taxi-trip-exec-1 -- ls -la /data1/

# Expected output shows shared storage with multiple block managers:
# drwxr-xr-x. 22 spark spark 16384 Sep 28 22:09 blockmgr-7c0ac908-26a3-4395-8a8f-2221b4d5d7c3
# drwxr-xr-x. 13 spark spark   116 Sep 28 22:09 blockmgr-9ed9c2fd-53e1-4337-8a68-9a48e1e63d5f
# drwxr-xr-x. 13 spark spark   116 Sep 28 22:09 blockmgr-ecc9fa35-a82e-4486-85fe-c8ef963d6eb7

# Verify shared storage between pods on same node
kubectl exec -n spark-team-a taxi-trip-driver -- ls -la /data1/

# View Spark application logs
kubectl logs -n spark-team-a -l spark-role=driver --follow

Verify Output Data

# Check processed output in S3
aws s3 ls s3://$S3_BUCKET/taxi-trip/output/

# Verify event logs
aws s3 ls s3://$S3_BUCKET/spark-event-logs/

Node Storage Considerations

Advantages

Cost Savings: ~70% cost reduction vs per-pod PVCs
Higher Performance: Direct node storage access
Simplified Operations: No PVC management overhead
Better Resource Utilization: Shared storage pool per node

Disadvantages & Mitigation

Noisy Neighbors: Multiple pods compete for same storage I/O
- Mitigation: Use compute-optimized instances with higher IOPS
No Isolation: Pods can see each other's temporary data
- Mitigation: Configure proper directory permissions
Storage Sizing: Must pre-size for all workloads on node
- Mitigation: Monitor usage and adjust volume size

When to Use Node Storage

✅ Good for:

Cost-sensitive workloads
Predictable I/O patterns
Trusted multi-tenant environments
Development/testing environments

❌ Avoid for:

Security-sensitive workloads requiring isolation
Unpredictable I/O bursts
Critical production workloads needing guarantees

Storage Class Options

# GP3 - Better price/performance (default)
volumeType: gp3

# IO1 - High IOPS workloads
volumeType: io1

# ST1 - Throughput-optimized
volumeType: st1

Cleanup

# Delete the Spark application
kubectl delete sparkapplication taxi-trip -n spark-team-a

# Node storage persists until node termination
# Data is automatically cleaned up when nodes are replaced

Next Steps

EBS Dynamic PVC Storage - Per-pod storage isolation
NVMe Instance Storage - High-performance local SSD
Infrastructure Setup - Deploy base infrastructure

Prerequisites​

Architecture: Shared EBS Volume per Node​

What is Shuffle Storage in Spark?​

Spark Shuffle Storage Options​

Benefits: Performance & Cost​

Example Code​

EBS Node Storage Configuration​

Create Test Data and Run Example​

1. Prepare Test Data​

2. Execute Spark Job​

Verify Data and Storage​

Monitor Node Storage Usage​

Check Pod Status and Storage​

Verify Output Data​

Node Storage Considerations​

Advantages​

Disadvantages & Mitigation​

When to Use Node Storage​

Storage Class Options​

Cleanup​

Next Steps​

Prerequisites

Architecture: Shared EBS Volume per Node

What is Shuffle Storage in Spark?

Spark Shuffle Storage Options

Benefits: Performance & Cost

Example Code

EBS Node Storage Configuration

Create Test Data and Run Example

1. Prepare Test Data

2. Execute Spark Job

Verify Data and Storage

Monitor Node Storage Usage

Check Pod Status and Storage

Verify Output Data

Node Storage Considerations

Advantages

Disadvantages & Mitigation

When to Use Node Storage

Storage Class Options

Cleanup

Next Steps