spark-datafusion-comet-benchmark
Apache Spark with Apache DataFusion Comet Benchmarks
Apache Spark powers large-scale analytics, but its JVM-based execution faces performance limitations. Apache DataFusion Comet attempts to address this by offloading compute operations to a native Rust execution engine built on Apache Arrow DataFusion.
This benchmark evaluates Comet's performance on Amazon EKS using the TPC-DS 1TB workload.
Our TPC-DS 1TB benchmark shows that Apache DataFusion Comet (v0.13.0) delivered 18% slower overall performance compared to native Spark SQL, with highly variable query-level results. Some queries saw ~50% improvements, while others saw ~1500% degradation. The degradation is primarily due to Comet's lack of support for Dynamic Partition Pruning (DPP) Performance is expected to improve significantly in future releases as DPP support is added.
TPC-DS 1TB Benchmark Results
Summary
Our comprehensive TPC-DS 1TB benchmark on Amazon EKS demonstrates that Apache DataFusion Comet does not provide overall speedup (18% slower) compared to native Spark SQL, with individual queries showing mixed results. Comet also required significantly more memory (~32GB more) than Spark's default execution engine to successfully complete all queries.
Overall Performance
Performance Distribution
Benchmark Infrastructure
Benchmarks ran sequentially on the same cluster to ensure identical hardware and eliminate resource contention. Native Spark executed first, followed by Comet.
To ensure an apples-to-apples comparison, both native Spark and Comet jobs ran on identical hardware, storage, and data. Only the execution engine and related Spark settings differed.
Test Environment
| Component | Configuration |
|---|---|
| EKS Cluster | Amazon EKS 1.34 |
| Node Instance Type | c5d.12xlarge (48 vCPUs, 96GB RAM, 1.8TB NVMe SSD) |
| Node Group | 24 nodes dedicated for benchmark workloads |
| Executor Configuration | 23 executors × 5 cores × 58GB RAM each |
| Driver Configuration | 5 cores × 20GB RAM |
| Dataset | TPC-DS 1TB (Parquet format) |
| Storage | Amazon S3 with optimized S3A connector |
Spark Configuration Comparison
| Configuration | Native Spark | Comet |
|---|---|---|
| Spark Version | 3.5.7 | 3.5.7 |
| Comet Version | N/A | 0.13.0 |
| Java Runtime | OpenJDK 17 | OpenJDK 17 |
| Execution Engine | JVM-based Tungsten | Rust + JVM hybrid |
| Key Plugins | Standard Spark | CometPlugin, CometShuffleManager |
| Off-heap Memory | 32GB enabled | 32GB enabled |
| Memory Management | JVM GC | Unified native + JVM |
Critical Comet-Specific Configurations
# Essential Comet Configuration
"spark.plugins": "org.apache.spark.CometPlugin"
"spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager"
# Memory Configuration - Critical for Comet
"spark.memory.offHeap.enabled": "true"
"spark.memory.offHeap.size": "32g" # Required: 16GB minimum, 32GB recommended
# Comet Execution Settings
"spark.comet.exec.enabled": "true"
"spark.comet.exec.shuffle.enabled": "true"
"spark.comet.exec.shuffle.mode": "auto"
"spark.comet.explainFallback.enabled": "true"
"spark.comet.cast.allowIncompatible": "true"
# AWS-Specific: Required for S3 region detection
"spark.hadoop.fs.s3a.endpoint.region": "us-west-2"
Performance Results
Overall Performance
| Name | Completion Time (seconds) | Performance |
|---|---|---|
| Native Spark | 2,090.46 | Baseline |
| Comet | 2,470.43 | -18% |
Performance Distribution
| Performance Range | Query Count | Percentage |
|---|---|---|
| 20%+ improvement | 25 | 24% |
| 10-20% improvement | 15 | 14% |
| ±10% (neutral) | 38 | 37% |
| 10-20% degradation | 4 | 4% |
| 20%+ degradation | 22 | 21% |
Top 10 Query Improvements
| Query | Native Spark (s) | Comet (s) | Improvement |
|---|---|---|---|
| q8-v2.4 | 5.9 | 2.7 | 54% faster |
| q5-v2.4 | 20.4 | 10.8 | 47% faster |
| q41-v2.4 | 1.2 | 0.7 | 40% faster |
| q93-v2.4 | 77.9 | 47.1 | 40% faster |
| q9-v2.4 | 56.4 | 37.2 | 34% faster |
| q76-v2.4 | 33.5 | 22.1 | 34% faster |
| q90-v2.4 | 14.6 | 9.7 | 33% faster |
| q73-v2.4 | 3.8 | 2.6 | 32% faster |
| q97-v2.4 | 19.5 | 13.5 | 31% faster |
| q44-v2.4 | 25.2 | 17.6 | 30% faster |
Top 10 Query Regressions
| Query | Native Spark (s) | Comet (s) | Degradation |
|---|---|---|---|
| q25-v2.4 | 5.7 | 79.9 | 1,308% slower |
| q17-v2.4 | 6.8 | 76.5 | 1,020% slower |
| q54-v2.4 | 6.2 | 42.3 | 586% slower |
| q29-v2.4 | 17.7 | 76.0 | 330% slower |
| q45-v2.4 | 6.6 | 27.6 | 316% slower |
| q6-v2.4 | 10.2 | 31.6 | 210% slower |
| q18-v2.4 | 10.5 | 31.3 | 199% slower |
| q68-v2.4 | 6.6 | 17.6 | 168% slower |
| q11-v2.4 | 25.7 | 67.4 | 162% slower |
| q74-v2.4 | 22.1 | 52.3 | 137% slower |
Extreme regressions (>100%) may indicate queries falling back to JVM execution while still incurring Comet's coordination overhead.
Notes on Performance Differences
Why Comet Underperforms on Some Queries:
As of Comet v0.13.0, Dynamic Partition Pruning (DPP) is not fully supported. This limitation significantly impacts queries with partitioned fact tables joined to filtered dimension tables. Without DPP, Comet scans entire datasets instead of pruning irrelevant partitions at runtime.
Example - Query 25 (1,308% slower):
- Comet: Scanned all 1,823 parquet files, spending ~34 minutes total CPU time
- Native Spark: Applied DPP to read only 30 files, spending ~3 minutes total CPU time
- Impact: 60x more files read, resulting in 14x slower execution
This pattern appears across the worst-performing queries (q25, q17, q54, q29), where Comet reads 10-60x more data than necessary.
Why Comet Outperforms on Other Queries:
Comet's native Rust execution engine excels at CPU-intensive operations
Example - Query 8 (54% faster):
- Parquet Scanning: Both engines read the same files, but Comet's native reader processed 12M rows in 4.0s vs Spark's 6.0s (50% faster)
- Sort-Merge Join: Comet completed the join in 1.4s vs Spark's 2.7s (93% faster)
- Aggregations: Comet's vectorized execution significantly outperformed Spark's JVM-based aggregation
Resource Usage Analysis
Both benchmarks ran sequentially on identical hardware to enable direct comparison. Native Spark executed first, followed by Comet. The metrics below show two distinct time periods—the first representing Native Spark, the second representing Comet.
CPU Utilization
Comet demonstrated significantly higher and more sustained CPU utilization compared to Native Spark:
- Native Spark: 3-6 cores per executor (typical), with occasional spikes to ~6.5 cores
- Comet: 5-9 cores per executor (sustained), with frequent peaks reaching 9+ cores
The higher CPU utilization reflects Comet's native Rust execution engine performing more intensive computation.

Memory Consumption
Comet required significantly more memory than Native Spark:
- Native Spark: ~24 GB per executor (observed steady state)
- Comet: ~40 GB per executor (observed steady state)
This represents a 67% increase in actual memory consumption and aligns with Comet's off-heap memory requirements and dual runtime architecture (Rust + JVM).

Network Bandwidth
Network utilization patterns were comparable between Native Spark and Comet, with both showing similar bandwidth consumption for transmit and receive operations.

Storage I/O
Storage utilization (IOPS and throughput) showed similar patterns between Native Spark and Comet, indicating comparable disk I/O characteristics.

Key Takeaway: Comet's native execution engine consumes significantly more CPU and memory resources compared to Native Spark, while network and storage utilization remain comparable. Despite the increased CPU and memory consumption, the 18% overall slowdown indicates these additional resources do not translate to net performance improvements for TPC-DS workloads.
When to Consider Comet
Despite the overall performance regression, Comet may be beneficial for:
- Workloads dominated by specific query patterns - If your workload consists primarily of queries similar to q8, q5, q41, q93 (30-54% faster), Comet could provide net benefits
- Targeted query optimization - Scenarios where you can isolate and route specific query patterns that align with Comet's strengths
- Future versions - As the project matures, performance characteristics may improve significantly
Not recommended for:
- General-purpose TPC-DS-like analytical workloads
- Memory-constrained environments (requires 2.2x memory overhead)
- Workloads requiring consistent, predictable performance across diverse query patterns
Issues
AWS S3 Region Detection
Comet cannot reliably determine AWS region automatically. You must explicitly set the S3 endpoint region to avoid failures.
Solution:
"spark.hadoop.fs.s3a.endpoint.region": "us-west-2"
Without this configuration, you may encounter:
org.apache.comet.CometNativeException: General execution error with reason: Generic S3 error: Failed to resolve region: error sending request for url
at org.apache.comet.parquet.Native.initRecordBatchReader(Native Method)
at org.apache.comet.parquet.NativeBatchReader.init(NativeBatchReader.java:568)
at org.apache.comet.parquet.CometParquetFileFormat.$anonfun$buildReaderWithPartitionValues$1(CometParquetFileFormat.scala:175)
Excessive DNS Query Volume
Comet generates significantly higher DNS query volume compared to native Spark, potentially hitting Route53 Resolver limits.
Observed behavior:
| Metric | Native Spark | Comet | Impact |
|---|---|---|---|
| DNS queries/sec | 5-10 | Up to 5,000 | 500x increase |
| Route53 limit | Well below 1,024/sec/ENI | Approaching/exceeding 1,024/sec/ENI | Limit reached |
Symptoms:
UnknownHostExceptionerrors during job execution- Intermittent S3 connectivity failures
- Job failures under high concurrency
Root cause: Comet's native Rust layer may not leverage JVM DNS caching mechanisms, resulting in excessive DNS lookups.
Solution: Deploy NodeLocal DNS Cache to cache DNS results on each node:
kubectl apply -f node-local-cache.yaml
Reference: Kubernetes NodeLocal DNSCache
Memory Requirements
Comet requires significantly more off-heap memory compared to native Spark.
Memory comparison:
Both benchmarks used identical pod configurations (58GB RAM per executor) to ensure fair comparison. The key differences:
| Metric | Native Spark | Comet | Difference |
|---|---|---|---|
| Configured executor memory | 58 GB | 58 GB | Same |
| Off-heap memory configuration | Default (minimal) | 32 GB | Required for Comet |
| Observed memory usage | ~24 GB | ~40 GB | +67% |
Key takeaways:
- Native Spark only utilized ~24GB of the 58GB allocation, leaving significant headroom
- Comet required ~40GB and needs the 32GB off-heap configuration to avoid OOM
- While Native Spark can run efficiently with 26-30GB total memory, Comet requires 58GB+ to operate reliably
- This means Comet needs 2.2x more memory than Native Spark for the same workload
Running Benchmarks
Prerequisites
- EKS cluster with Spark Operator installed
- S3 bucket with TPC-DS benchmark data
- Docker registry access (ECR or other)
Step 1: Build Docker Image
Build the Comet-enabled Spark image:
cd data-stacks/spark-on-eks/benchmarks/datafusion-comet
# Build and push to your registry
docker build -t <your-registry>/spark-comet:3.5.7 -f Dockerfile-comet .
docker push <your-registry>/spark-comet:3.5.7
Step 2: Install NodeLocal DNS Cache
To mitigate the DNS query volume issue, install NodeLocal DNS Cache:
kubectl apply -f node-local-cache.yaml
This caches DNS results locally on each node, preventing the excessive DNS queries to Route53 Resolver.
Step 3: Update Bucket Names
Update the S3 bucket references in the benchmark manifests:
# For Comet benchmark
export S3_BUCKET=your-bucket-name
envsubst < tpcds-benchmark-comet-template.yaml | kubectl apply -f -
# For native Spark baseline
envsubst < tpcds-benchmark-357-template.yaml | kubectl apply -f -
Or manually edit the YAML files and replace ${S3_BUCKET} with your bucket name.
Step 4: Monitor Benchmark Progress
# Check job status
kubectl get sparkapplications -n spark-team-a
# View logs
kubectl logs -f -n spark-team-a -l spark-app-name=tpcds-benchmark-comet