Skip to main content

spark-datafusion-comet-benchmark

Apache Spark with Apache DataFusion Comet Benchmarks

Apache Spark powers large-scale analytics, but its JVM-based execution faces performance limitations. Apache DataFusion Comet attempts to address this by offloading compute operations to a native Rust execution engine built on Apache Arrow DataFusion.

This benchmark evaluates Comet's performance on Amazon EKS using the TPC-DS 3TB workload.

TL;DR

Our TPC-DS 3TB benchmark shows that Apache DataFusion Comet (v0.15.0) delivered 11% faster overall performance compared to native Spark SQL, with variable query-level results. Some queries saw ~50% improvements, while others saw ~35% degradation.

TPC-DS 3TB Benchmark Results

Summary

Our comprehensive TPC-DS 3TB benchmark on Amazon EKS demonstrates that Apache DataFusion Comet (v0.15.0) provides an overall speedup (11% faster) compared to native Spark SQL, with individual queries varying from ~50% faster to ~35% slower (one outlier at 90%, likely a measurement artifact).

NameCompletion Time (seconds)Speedup
Native Spark3,650.56Baseline
Comet3,246.8711% faster

Benchmark Infrastructure

Benchmark Methodology

Benchmarks ran sequentially on the same cluster to ensure identical hardware and eliminate resource contention. Native Spark executed first, followed by Comet.

To ensure an apples-to-apples comparison, both native Spark and Comet jobs ran on identical hardware, storage, and data. Only the execution engine and related Spark settings differed.

Test Environment

ComponentConfiguration
EKS ClusterAmazon EKS 1.34
Node Instance Typer8gd.12xlarge (48 vCPUs, 384GB RAM, 1.8TB NVMe SSD)
Node Group24 nodes dedicated for benchmark workloads
Executor Configuration23 executors × 5 cores × 58GB RAM each
Driver Configuration5 cores × 20GB RAM
DatasetTPC-DS 3TB (Parquet format)
StorageAmazon S3 with optimized S3A connector

Spark Configuration Comparison

ConfigurationNative SparkComet
Spark Version3.5.83.5.8
Comet VersionN/A0.15.0
Java RuntimeOpenJDK 17OpenJDK 17
Execution EngineJVM-based TungstenRust + JVM hybrid
Key PluginsStandard SparkCometPlugin, CometShuffleManager
Off-heap Memory32GB enabled32GB enabled
Memory ManagementJVM GCUnified native + JVM
memoryThrottlingFactor0.70.7

Critical Comet-Specific Configurations

# Essential Comet Configuration
"spark.plugins": "org.apache.spark.CometPlugin"
"spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager"

# Memory Configuration - Critical for Comet
"spark.memory.offHeap.enabled": "true"
"spark.memory.offHeap.size": "32g" # Required: 16GB minimum, 32GB recommended

# Comet Execution Settings
"spark.comet.exec.enabled": "true"
"spark.comet.exec.shuffle.enabled": "true"
"spark.comet.exec.shuffle.mode": "auto"
"spark.comet.explainFallback.enabled": "true"
"spark.comet.cast.allowIncompatible": "true"
"spark.comet.dppFallback.enabled": "true"

# AWS-Specific: Required for S3 region detection
"spark.hadoop.fs.s3a.endpoint.region": "us-west-2"

Performance Results

Overall Performance

NameCompletion Time (seconds)Speedup
Native Spark3,650.56Baseline
Comet3,246.8711% faster

Performance Distribution

Performance RangeQuery CountPercentage
20%+ improvement3231%
10-20% improvement1918%
±10%4443%
10-20% degradation44%
20%+ degradation44%

Top 10 Query Improvements

QueryNative Spark (s)Comet (s)Improvement
q5-v4.040.820.051% faster
q56-v4.05.83.049% faster
q41-v4.00.80.447% faster
q45-v4.012.37.043% faster
q9-v4.083.849.042% faster
q80-v4.032.620.138% faster
q58-v4.05.43.338% faster
q86-v4.012.88.136% faster
q83-v4.02.81.836% faster
q73-v4.03.42.236% faster

Top 10 Query Regressions

QueryNative Spark (s)Comet (s)Degradation
q32-v4.02.95.690% slower
q50-v4.082.7111.535% slower
q39a-v4.05.16.323% slower
q68-v4.05.97.121% slower
q20-v4.05.15.814% slower
q30-v4.015.917.711% slower
q67-v4.0141.2156.811% slower
q25-v4.08.99.910% slower
q29-v4.034.437.810% slower
q91-v4.02.12.310% slower

Notes on Performance Differences

Why Comet Underperforms on Some Queries:

Example - Query 32 (-90.1%)

  • The regressing stages run identical Spark WholeStageCodegen in both apps -- no Comet operators are involved in the slow stages
  • Comet's best iteration (2,132ms) is faster than Default's best (2,338ms); the median regression is driven by single-task stragglers reaching 7.1s (10.5x p50) in Comet vs 1.8s max in Default
  • Measurement artifact from infrastructure-level straggler interference on a single task out of 118 in Comet's non-Comet scan stages, amplified by 6.6x wider iteration variance (range 8,856ms vs 1,351ms)

Example - Query 39a (-23.1%)

  • Inventory scan stage median task time 35-77% higher in Comet (234ms vs 173ms) due to columnar shuffle serialization overhead
  • CometColumnarExchange retained 200 partitions where Default used 7, inflating broadcast stage from 32ms to 358ms and producing 3.2x more shuffle data (23.3 MB vs 7.2 MB)
  • Two compounding factors: slower scan-stage serialization to Comet's columnar format (+330-538ms), and CometColumnarExchange overriding the natural partition count with 200 partitions for a broadcast exchange that only needed 7

Why Comet Outperforms Some Queries:

Comet's native Rust execution engine excels at CPU-intensive operations

Example - Query 5 (+50.9%)

  • Shuffle read on the SortMergeJoin stage dropped 92% (27.9 GB to 2.2 GB) due to Comet's native columnar shuffle for web_sales (2.16B rows)
  • GC time in the scan stage dropped from 7,236ms to 77ms; CometSort was 56% faster than Spark Sort (13.9 min vs 31.9 min total)
  • Comet's native scan-filter-exchange pipeline for web_sales produced compact Arrow columnar shuffle data, massively reducing downstream shuffle read and sort time in the join stage, with cascading contention reduction benefiting non-Comet stages

Example - Query 56 (+49.0%)

  • AQE converted all three SortMergeJoins to BroadcastHashJoins because CometExchange reported customer_address shuffle size as 2.7 MiB (vs 10.7 MiB in Default), falling below the 10 MB broadcast threshold
  • Combined join stage time dropped from 8.6s to 1.4s; GC dropped from 2.8s to 206ms
  • Comet's Arrow columnar shuffle encoding produced a more compact representation that caused AQE to select broadcast joins, eliminating sort buffers, shuffle overhead, and GC pressure from sort buffer allocation

Resource Usage Analysis

Both benchmarks ran sequentially on identical hardware to enable direct comparison.

CPU Utilization

CPU utilization showed similar patterns between Native Spark and Comet, with both sustaining comparable peak and average usage throughout the benchmark.

Native Spark (Default)CPU utilization - Native Spark
CometCPU utilization - Comet

Memory Utilization

Comet consumed more memory than Native Spark, peaking around ~52GB compared to ~48GB for Native Spark. Native Spark's memory usage returned to a baseline of ~20GB between queries, while Comet remained elevated. This is expected — Comet offloads most compute operations to its native Rust engine using off-heap memory, so the JVM heap sees less activity while native allocations persist between queries. This suggests that on-heap JVM memory per executor could potentially be reduced when running Comet, shifting the memory budget toward off-heap allocation.

Native Spark (Default)Memory utilization - Native Spark
CometMemory utilization - Comet

Network Bandwidth

Network bandwidth was comparable between Native Spark and Comet, with both showing similar transmit and receive throughput.

Native Spark (Default)Network bandwidth - Native Spark
CometNetwork bandwidth - Comet

Storage I/O

Storage utilization (IOPS and throughput) showed Comet with higher peak IOPS than Native Spark.

Native Spark (Default)Storage IOPS - Native Spark
CometStorage IOPS - Comet
Full Grafana Dashboard — Native Spark (Default)
Full Grafana dashboard - Native Spark
Full Grafana Dashboard — Comet
Full Grafana dashboard - Comet

Summary: Comet uses slightly more memory than Native Spark (higher peak and elevated baseline between queries), while CPU, network, and storage utilization remain comparable. The 11% overall speedup comes without a significant increase in resource consumption. Since Comet performs most compute off-heap in its native Rust engine, operators may be able to reduce on-heap JVM memory per executor and allocate more toward off-heap, potentially improving cost efficiency.

When to Consider Comet

Comet may be beneficial for:

  • Workloads dominated by specific query patterns - If your workload consists primarily of queries similar to q5, q56, q41, q45 (40%+ faster), Comet could provide net benefits
  • Targeted query optimization - Scenarios where you can isolate and route specific query patterns that align with Comet's strengths
  • Future versions - As the project matures, performance characteristics may change and improve significantly

Running Benchmarks

Prerequisites

  • EKS cluster with Spark Operator installed
  • S3 bucket with TPC-DS benchmark data
  • Docker registry access (ECR or other)

Step 1: Build Docker Image

Build the Comet-enabled Spark image:

cd data-stacks/spark-on-eks/benchmarks/datafusion-comet-benchmarks

# Build and push to your registry
docker build -t <your-registry>/spark-comet:3.5.8 -f Dockerfile-comet .
docker push <your-registry>/spark-comet:3.5.8

Step 2: Install NodeLocal DNS Cache

To mitigate the DNS query volume issue, install NodeLocal DNS Cache:

kubectl apply -f node-local-cache.yaml

This caches DNS results locally on each node, preventing the excessive DNS queries to Route53 Resolver.

Step 3: Update Bucket Names

Update the S3 bucket references in the benchmark manifests:

# For Comet benchmark
export S3_BUCKET=your-bucket-name
envsubst < tpcds-benchmark-comet-parquet.yaml | kubectl apply -f -

# For native Spark baseline
envsubst < tpcds-benchmark-parquet.yaml | kubectl apply -f -

Or manually edit the YAML files and replace ${S3_BUCKET} with your bucket name.

Step 4: Monitor Benchmark Progress

# Check job status
kubectl get sparkapplications -n spark-team-a

# View logs
kubectl logs -f -n spark-team-a -l spark-app-name=tpcds-benchmark-comet