Apache Spark with Apache Gluten + Velox Benchmarks

Apache Spark powers much of today’s large-scale analytics, but its default SQL engine is still JVM-bound and row-oriented. Even with Project Tungsten’s code generation and vectorized readers, operators often pay heavy costs for Java object creation, garbage collection, and row-to-column conversions. These costs become visible on analytic workloads that scan large Parquet or ORC tables, perform wide joins, or run memory-intensive aggregations—leading to slower queries and inefficient CPU use.

Modern C++ engines such as Velox, ClickHouse, and DuckDB show that SIMD-optimized, cache-aware vectorization can process the same data far faster. But replacing Spark is impractical given its ecosystem and scheduling model. Apache Gluten solves this by translating Spark SQL plans into the open Substrait IR and offloading execution to a native C++ backend (Velox, ClickHouse, etc.). This approach keeps Spark’s APIs and Kubernetes deployment model while accelerating the CPU-bound SQL layer—the focus of this deep dive and benchmark study on Amazon EKS.

In this guide you will:

Understand how the Spark + Gluten + Velox stack is assembled on Amazon EKS
Review TPC-DS 1TB benchmark results against native Spark
Learn the configuration, deployment, and troubleshooting steps required to reproduce the study

TL;DR

Benchmark scope: TPC-DS 1TB, three iterations on Amazon EKS
Toolchain: Apache Spark + Apache Gluten + Velox
Performance: 1.72× faster runtime overall, with peak 5.48× speedups on aggregation-heavy queries
Cost impact: ≈42% lower compute spend from shorter runs and higher CPU efficiency

TPC-DS 1TB Benchmark Results: Native Spark vs. Gluten + Velox Performance Analysis

Interactive Performance Dashboard

We benchmarked TPC-DS 1TB workloads on a dedicated Amazon EKS cluster to compare native Spark SQL execution with Spark enhanced by Gluten and the Velox backend. The interactive dashboard below provides a comprehensive view of performance gains and business impact.

🚀

1.72x

Overall Performance Gain

72% faster execution across all 104 TPC-DS queries

💰

42%

Cost Reduction Potential

Direct correlation between performance improvement and compute costs

📊

86.5%

Success Rate

90 out of 104 queries improved, only 14 showed degradation

⏱️

42 min

Time Saved

42 minutes saved on TPC-DS 1TB benchmark suite (1.7h → 1.0h)

Performance Comparison: Runtime Analysis

Query Speedup Distribution

Top 10 Performance Improvements

Performance Improvement Distribution

Improved

Degraded

1.77x

Median

86.5%

Success Rate

Performance Categories

Excellent (3x+): 15 queriesGood (2x-3x): 25 queriesModerate (1.5x-2x): 23 queriesSlight (1x-1.5x): 27 queriesDegraded (<1x): 14 queries

Query Execution Time: Spark vs Gluten+Velox (Top 30 Queries by Improvement)

🔍 Performance Analysis Insights

Complex Analytical Queries: Queries with heavy joins and aggregations (q93, q49, q50) show the highest improvements (3.8x-5.6x)
Scan-Heavy Operations: Large table scans benefit significantly from native columnar processing
Vectorization Benefits: Mathematical operations and filters see consistent 2x-3x improvements
Memory-Intensive Queries: Queries like q23b (146s→52s) demonstrate native memory management advantages
Edge Cases: 14 queries showed degradation, primarily those with simple operations where JNI overhead exceeded benefits
Cost Savings: 69.8% reduction in execution time translates to ~42% lower compute costs on EKS

Summary

Our comprehensive TPC-DS 1TB benchmark on Amazon EKS demonstrates that Apache Gluten with Velox delivers a 1.72x overall speedup (72% faster) compared to native Spark SQL, with individual queries showing improvements ranging from 1.1x to 5.5x.

📊 View complete benchmark results and raw data →

Benchmark Infrastructure Configuration

To ensure an apples-to-apples comparison, both native Spark and Gluten + Velox jobs ran on identical hardware, storage, and data. Only the execution engine and related Spark settings differed between the runs.

Test Environment Specifications

Component	Configuration
EKS Cluster	Amazon EKS 1.33
Node Instance Type	c5d.12xlarge (48 vCPUs, 96GB RAM, 1.8TB NVMe SSD)
Node Group	8 nodes dedicated for benchmark workloads
Executor Configuration	23 executors × 5 cores × 20GB RAM each
Driver Configuration	5 cores × 20GB RAM
Dataset	TPC-DS 1TB (Parquet format)
Storage	Amazon S3 with optimized S3A connector

Spark Configuration Comparison

Configuration	Native Spark	Gluten + Velox
Spark Version	3.5.3	3.5.2
Java Runtime	OpenJDK 17	OpenJDK 17
Execution Engine	JVM-based Tungsten	Native C++ Velox
Key Plugins	Standard Spark	`GlutenPlugin`, `ColumnarShuffleManager`
Off-heap Memory	Default	2GB enabled
Vectorized Processing	Limited Java SIMD	Full C++ vectorization
Memory Management	JVM GC	Unified native + JVM

Critical Gluten-Specific Configurations

# Essential Gluten Plugin Configuration
spark.plugins: "org.apache.gluten.GlutenPlugin"
spark.shuffle.manager: "org.apache.spark.shuffle.sort.ColumnarShuffleManager"
spark.memory.offHeap.enabled: "true"
spark.memory.offHeap.size: "2g"

# Java 17 Compatibility for Gluten-Velox
spark.driver.extraJavaOptions: "--add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.misc=ALL-UNNAMED"
spark.executor.extraJavaOptions: "--add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.misc=ALL-UNNAMED"

Performance Analysis: Top 20 Query Improvements

Gluten’s native execution path shines on wide, compute-heavy SQL. The table highlights the largest gains across the 104 TPC-DS queries, comparing median runtimes over multiple iterations.

Rank	TPC-DS Query	Native Spark (s)	Gluten + Velox (s)	Speedup	% Improvement
1	q93-v2.4	80.18	14.63	5.48×	448.1%
2	q49-v2.4	25.68	6.66	3.86×	285.5%
3	q50-v2.4	38.57	10.00	3.86×	285.5%
4	q59-v2.4	17.57	4.82	3.65×	264.8%
5	q5-v2.4	23.18	6.42	3.61×	261.4%
6	q62-v2.4	9.41	2.88	3.27×	227.0%
7	q97-v2.4	18.68	5.99	3.12×	211.7%
8	q40-v2.4	15.17	5.05	3.00×	200.2%
9	q90-v2.4	12.05	4.21	2.86×	186.2%
10	q23b-v2.4	147.17	52.96	2.78×	177.9%
11	q29-v2.4	17.33	6.45	2.69×	168.7%
12	q9-v2.4	60.90	23.03	2.64×	164.5%
13	q96-v2.4	9.19	3.55	2.59×	158.8%
14	q84-v2.4	7.99	3.12	2.56×	156.1%
15	q6-v2.4	9.87	3.87	2.55×	155.3%
16	q99-v2.4	9.70	3.81	2.55×	154.6%
17	q43-v2.4	4.70	1.87	2.51×	151.1%
18	q65-v2.4	17.51	7.00	2.50×	150.2%
19	q88-v2.4	50.90	20.69	2.46×	146.1%
20	q44-v2.4	22.90	9.36	2.45×	144.7%

Speedup Distribution Across Queries

Speedup Range	Count	% of Total (≈97 queries)
≥ 3× and < 5×	9	≈ 9%
≥ 2× and < 3×	29	≈ 30%
≥ 1.5× and < 2×	30	≈ 31%
≥ 1× and < 1.5×	21	≈ 22%
< 1× (slower with Gluten)	8	≈ 8%

Key Performance Insights

Dimension	Insight	Impact
Aggregate Gains	Total runtime dropped from 1.7 hours to 1.0 hour (42 minutes saved) Overall speedup of 1.72× across the TPC-DS suite Peak single-query speedup of 5.48× (q93-v2.4)	Shorter batch windows and faster SLAs Operational stability preserved via seamless Spark fallbacks
Query Patterns	Complex analytical queries accelerate by 3×-5.5× Join-heavy workloads benefit from Velox hash joins Aggregations and scans see consistent 2×-3× improvements	Prioritize Gluten adoption for compute-bound SQL pipelines Plan for faster dimensional modeling and BI refreshes
Resource Utilization	CPU efficiency improves by ~72% Unified native memory dramatically reduces GC pressure Columnar shuffle + native readers boost I/O throughput	Lower infrastructure spend for the same workload Smoother execution with fewer GC pauses More predictable runtimes under heavy data scans

Business Impact Assessment

Cost Optimization Summary

note

With a 1.72× speedup, organizations can achieve:

≈42% lower compute spend for batch processing workloads
Faster time-to-insight for business-critical analytics
Higher cluster utilization through reduced job runtimes

Operational Benefits

tip

Minimal migration effort: Drop-in plugin with existing Spark SQL code
Production-ready reliability preserves operational stability
Kubernetes-native integration keeps parity with existing EKS data platforms

Technical Recommendations

When to Deploy Gluten + Velox

High-Volume Analytics: TPC-DS-style complex queries with joins and aggregations
Cost-Sensitive Workloads: Where 40%+ compute cost reduction justifies integration effort
Performance-Critical Pipelines: SLA-driven workloads requiring faster execution

Implementation Considerations

Query Compatibility: Test edge cases in your specific workload patterns
Memory Tuning: Optimize off-heap allocation based on data characteristics
Monitoring: Leverage native metrics for performance debugging and optimization

The benchmark results demonstrate that Gluten + Velox represents a significant leap forward in Spark SQL performance, delivering production-ready native acceleration without sacrificing Spark's distributed computing advantages.

Why a few queries regress?

caution

While Spark + Gluten + Velox was ~1.7× faster overall, a small set of TPC-DS queries ran slower. Gluten intentionally falls back to Spark’s JVM engine when an operator or expression isn’t fully supported natively. Those fallbacks introduce row↔columnar conversion boundaries and can change shuffle or partition behavior—explaining isolated regressions (q22, q67, q72 in our run).

To diagnose these cases:

Inspect the Spark physical plan for GlutenRowToArrowColumnar or VeloxColumnarToRowExec nodes surrounding a non-native operator.
Confirm native coverage by checking for WholeStageTransformer stages in the Gluten job.
Compare shuffle partition counts; Gluten fallbacks can alter skew handling versus native Spark.

Version differences did not skew the benchmark: Spark 3.5.3 (native) and Spark 3.5.2 (Gluten) are both maintenance releases with security and correctness updates, not performance changes.

Architecture Overview — Apache Spark vs. Apache Spark with Gluten + Velox

Understanding how Gluten intercepts Spark plans clarifies why certain workloads accelerate so sharply. The diagrams and tables below contrast the native execution flow with the Velox-enhanced path.

Execution Path Comparison

Memory & Processing Comparison

Aspect	Native Spark	Gluten + Velox	Impact
Memory Model	JVM heap objects	Apache Arrow off-heap columnar	40% less GC overhead
Processing	Row-by-row iteration	SIMD vectorized batches	8-16 rows per CPU cycle
CPU Cache	Poor locality	Cache-friendly columns	85% vs 60% efficiency
Memory Bandwidth	40 GB/s typical	65+ GB/s sustained	60% bandwidth increase

What Is Apache Gluten — Why It Matters

Apache Gluten is a middleware layer that offloads Spark SQL execution from the JVM to high-performance native execution engines. For data engineers, this means:

Core Technical Benefits

Zero Application Changes: Existing Spark SQL and DataFrame code works unchanged
Automatic Fallback: Unsupported operations gracefully fall back to native Spark
Cross-Engine Compatibility: Uses Substrait as intermediate representation
Production Ready: Handles complex enterprise workloads without code changes

Gluten Plugin Architecture

Key Configuration Parameters

# Essential Gluten Configuration
sparkConf:
  # Core Plugin Activation
  "spark.plugins": "org.apache.gluten.GlutenPlugin"
  "spark.shuffle.manager": "org.apache.spark.shuffle.sort.ColumnarShuffleManager"

  # Memory Configuration
  "spark.memory.offHeap.enabled": "true"
  "spark.memory.offHeap.size": "4g"  # Critical for Velox performance

  # Fallback Control
  "spark.gluten.sql.columnar.backend.velox.enabled": "true"
  "spark.gluten.sql.columnar.forceShuffledHashJoin": "true"

What Is Velox — Why Gluten Needs It (Alternatives)

Velox is Meta's C++ vectorized execution engine optimized for analytical workloads. It serves as the computational backend for Gluten, providing:

Velox Core Components

Layer	Component	Purpose
Operators	Filter, Project, Aggregate, Join	Vectorized SQL operations
Expressions	Vector functions, Type system	SIMD-optimized computations
Memory	Apache Arrow buffers, Custom allocators	Cache-efficient data layout
I/O	Parquet/ORC readers, Compression	High-throughput data ingestion
CPU	AVX2/AVX-512, ARM Neon	Hardware-accelerated processing

Velox vs Alternative Backends

Feature	Velox	ClickHouse	Apache Arrow DataFusion
Language	C++	C++	Rust
SIMD Support	AVX2/AVX-512/Neon	AVX2/AVX-512	Limited
Memory Model	Apache Arrow Columnar	Native Columnar	Apache Arrow Native
Spark Integration	Native via Gluten	Via Gluten	Experimental
Performance	Excellent	Excellent	Good
Maturity	Production (Meta)	Production	Developing

Configuring Spark + Gluten + Velox

The instructions in this section walk through the baseline artifacts you need to build an image, configure Spark defaults, and deploy workloads on the Spark Operator.

Docker Image Configuration

Create a production-ready Spark image with Gluten + Velox:

You can find the sample Dockerfile here: Dockerfile-spark-gluten-velox

Spark Configuration Examples

Use the templates below to bootstrap both shared Spark defaults and a sample SparkApplication manifest.

spark-defaults.conf
SparkApplication YAML

# spark-defaults.conf - Optimized for Gluten + Velox

# Core Gluten Configuration
spark.plugins                           org.apache.gluten.GlutenPlugin
spark.shuffle.manager                   org.apache.spark.shuffle.sort.ColumnarShuffleManager

# Memory Configuration - Critical for Performance
spark.memory.offHeap.enabled           true
spark.memory.offHeap.size               4g
spark.executor.memoryFraction           0.8
spark.executor.memory                   20g
spark.executor.memoryOverhead           6g

# Velox-specific Optimizations
spark.gluten.sql.columnar.backend.velox.enabled              true
spark.gluten.sql.columnar.forceShuffledHashJoin              true
spark.gluten.sql.columnar.backend.velox.bloom_filter.enabled true

# Java 17 Module Access (Required)
spark.driver.extraJavaOptions   --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.misc=ALL-UNNAMED
spark.executor.extraJavaOptions --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.misc=ALL-UNNAMED

# Adaptive Query Execution
spark.sql.adaptive.enabled                     true
spark.sql.adaptive.coalescePartitions.enabled  true
spark.sql.adaptive.skewJoin.enabled            true

# S3 Optimizations
spark.hadoop.fs.s3a.fast.upload.buffer         disk
spark.hadoop.fs.s3a.multipart.size             128M
spark.hadoop.fs.s3a.connection.maximum         200

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: "test-gluten-velox"
  namespace: spark-team-a
spec:
  type: Scala
  mode: cluster
  image: "your-registry/spark-gluten-velox:latest"
  imagePullPolicy: Always
  sparkVersion: "3.5.2"
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar"
  arguments:
    - "1000"  # High iteration count to see Velox benefits

  driver:
    cores: 2
    memory: "4g"
    memoryOverhead: "1g"
    serviceAccount: spark-team-a
    env:
      - name: JAVA_HOME
        value: "/usr/lib/jvm/java-17-openjdk-amd64"

  executor:
    cores: 4
    memory: "8g"
    memoryOverhead: "2g"
    instances: 2
    serviceAccount: spark-team-a
    env:
      - name: JAVA_HOME
        value: "/usr/lib/jvm/java-17-openjdk-amd64"

  sparkConf:
    # Gluten Configuration
    "spark.plugins": "org.apache.gluten.GlutenPlugin"
    "spark.shuffle.manager": "org.apache.spark.shuffle.sort.ColumnarShuffleManager"
    "spark.memory.offHeap.enabled": "true"
    "spark.memory.offHeap.size": "2g"

    # Debugging and Monitoring
    "spark.gluten.sql.debug": "true"
    "spark.sql.planChangeLog.level": "WARN"
    "spark.eventLog.enabled": "true"
    "spark.eventLog.dir": "s3a://your-bucket/spark-event-logs"

    # Java 17 Compatibility
    "spark.driver.extraJavaOptions": "--add-opens=java.base/java.nio=ALL-UNNAMED"
    "spark.executor.extraJavaOptions": "--add-opens=java.base/java.nio=ALL-UNNAMED"

Why these defaults?

spark.plugins activates the Apache Gluten runtime so query plans can offload to Velox.
Off-heap configuration reserves Arrow buffers that prevent JVM garbage collection pressure.
Adaptive query execution settings keep shuffle partitions balanced under both native and Gluten runs.
S3 connector tuning avoids bottlenecks when scanning the 1TB TPC-DS dataset from Amazon S3.

Running Benchmarks

Follow the workflow below to reproduce the benchmark from data generation through post-run analysis.

TPC-DS Benchmark Setup

The complete TPC-DS harness is available in the repository: examples/benchmark/tpcds-benchmark-spark-gluten-velox/README.md.

Step 1: Generate TPC-DS Data (1TB scale)

Follow this link to generate the test data in S3 bucket

Step 2: Submit Native & Gluten Jobs

Prerequisites

Before submitting benchmark jobs, ensure:

S3 Bucket is configured: Export the S3 bucket name from your Terraform outputs
Benchmark data is available: Verify TPC-DS 1TB data exists in the same S3 bucket

Export S3 bucket name from Terraform outputs:

Export S3 bucket variable
# Get S3 bucket name from Terraform outputs
export S3_BUCKET=$(terraform -chdir=path/to/your/terraform output -raw s3_bucket_id_data)

# Verify the bucket and data exist
aws s3 ls s3://$S3_BUCKET/blog/BLOG_TPCDS-TEST-3T-partitioned/

Submit benchmark jobs:

Native Spark
Gluten + Velox

Submit native Spark benchmark
envsubst < tpcds-benchmark-native-c5d.yaml | kubectl apply -f -

Submit Gluten + Velox benchmark
envsubst < tpcds-benchmark-gluten-c5d.yaml | kubectl apply -f -

Step 3: Monitor Benchmark Progress

Status
Logs
History UI

Check SparkApplication status
kubectl get sparkapplications -n spark-team-a

Tail benchmark logs
kubectl logs -f -n spark-team-a -l spark-app-name=tpcds-benchmark-native-c5d
kubectl logs -f -n spark-team-a -l spark-app-name=tpcds-benchmark-gluten-c5d

Port-forward Spark History Server
kubectl port-forward svc/spark-history-server 18080:80 -n spark-history-server

Step 4: Spark History Server Analysis

Access detailed execution plans and metrics:

Open Spark History Server locally
kubectl port-forward svc/spark-history-server 18080:80 -n spark-history-server

Navigation Checklist

Point your browser to http://localhost:18080.
Locate both spark-<ID>-native and spark-<ID>-gluten applications.
In the Spark UI, inspect:
1. SQL tab execution plans
2. Presence of WholeStageTransformer stages in Gluten jobs
3. Stage execution times across both runs
4. Executor metrics for off-heap memory usage

Step 5: Summarize Findings

tip

Export runtime metrics from the Spark UI or event logs for both jobs.
Capture query-level comparisons (duration, stage counts, fallbacks) to document where Gluten accelerated or regressed.
Feed the results into cost or capacity planning discussions—speedups translate directly into smaller clusters or faster SLA achievement.

Key Metrics to Analyze

tip

As you compare native and Gluten runs, focus on the following signals:

Query Plan Differences:
- Native: WholeStageCodegen stages
- Gluten: WholeStageTransformer stages
Memory Usage Patterns:
- Native: High on-heap usage, frequent GC
- Gluten: Off-heap Arrow buffers, minimal GC
CPU Utilization:
- Native: 60-70% efficiency
- Gluten: 80-90+ % efficiency with SIMD

Performance Analysis and Pitfalls

Gluten reduces friction for Spark adopters, but a few tuning habits help avoid regressions. Use the notes below as a checklist during rollout.

Common Configuration Pitfalls

caution

# ❌ WRONG - Insufficient off-heap memory
"spark.memory.offHeap.size": "512m"  # Too small for real workloads

# ✅ CORRECT - Adequate off-heap allocation
"spark.memory.offHeap.size": "4g"    # 20-30% of executor memory

# ❌ WRONG - Missing Java module access
# Results in: java.lang.IllegalAccessError

# ✅ CORRECT - Required for Java 17
"spark.executor.extraJavaOptions": "--add-opens=java.base/java.nio=ALL-UNNAMED"

# ❌ WRONG - Velox backend not enabled
"spark.gluten.sql.columnar.backend.ch.enabled": "true"  # ClickHouse, not Velox!

# ✅ CORRECT - Velox backend configuration
"spark.gluten.sql.columnar.backend.velox.enabled": "true"

Performance Optimization Tips

tip

Memory Sizing:
- Off-heap: 20-30% of executor memory
- Executor overhead: 15-20% reserved for Arrow buffers
- Driver memory: 4-8 GB for complex queries
CPU Optimization:
- Use AVX2-capable instance types (Intel Xeon, AMD EPYC)
- Avoid ARM instances for maximum SIMD benefit
- Set spark.executor.cores = 4-8 for optimal vectorization
I/O Configuration:
- Enable S3A fast upload: spark.hadoop.fs.s3a.fast.upload.buffer=disk
- Increase connection pool to 200 connections: spark.hadoop.fs.s3a.connection.maximum=200
- Use larger multipart sizes of 128 MB: spark.hadoop.fs.s3a.multipart.size=128M

Debugging Gluten Issues

note

# Enable debug logging
"spark.gluten.sql.debug": "true"
"spark.sql.planChangeLog.level": "WARN"

# Check for fallback operations
kubectl logs <spark-pod> | grep -i "fallback"

# Verify Velox library loading
kubectl exec <spark-pod> -- find /opt/spark -name "*velox*"

# Monitor off-heap memory usage
kubectl top pod <spark-pod> --containers

Verifying Gluten+Velox Execution in Spark History Server

When Gluten+Velox is working correctly, you'll see distinctive execution patterns in the Spark History Server that indicate native acceleration:

Key Indicators of Gluten+Velox Execution:

VeloxSparkPlanExecApi.scala references in stages and tasks
WholeStageCodegenTransformer nodes in the DAG visualization
ColumnarBroadcastExchange operations instead of standard broadcast
GlutenWholeStageColumnarRDD in the RDD lineage
Methods like executeColumnar and mapPartitions at VeloxSparkPlanExecApi.scala lines

Example DAG Pattern:

AQEShuffleRead
├── ColumnarBroadcastExchange
├── ShuffledColumnarBatchRDD [Unordered]
│   └── executeColumnar at VeloxSparkPlanExecApi.scala:630
└── MapPartitionsRDD [Unordered]
    └── mapPartitions at VeloxSparkPlanExecApi.scala:632

What This Means:

VeloxSparkPlanExecApi: Gluten's interface layer to the Velox execution engine
Columnar operations: Data processed in columnar format (more efficient than row-by-row)
WholeStageTransformer: Multiple Spark operations fused into single native Velox operations
Off-heap processing: Memory management handled by Velox, not JVM garbage collector

If you see traditional Spark operations like mapPartitions at <WholeStageCodegen> without Velox references, Gluten may have fallen back to JVM execution for unsupported operations.

Conclusion

Apache Gluten with the Velox backend consistently accelerates Spark SQL workloads on Amazon EKS, delivering a 1.72× overall speedup and driving ≈42% lower compute spend in our TPC-DS 1TB benchmark. The performance gains stem from offloading compute-intensive operators to a native, vectorized engine, reducing JVM overhead and improving CPU efficiency.

When planning your rollout:

Start by mirroring the configurations documented above, then tune off-heap memory and shuffle behavior based on workload shape.
Use the Spark Operator deployment flow to A/B test native and Gluten runs so you can quantify gains and detect fallbacks early.
Monitor Spark UI and metrics exports to build a data-backed case for production adoption or cluster right-sizing.

With the Docker image, Spark defaults, and example manifests provided in this guide, you can reproduce the benchmark end-to-end and adapt the pattern for your own cost and performance goals.

For complete implementation examples and benchmark results, see the GitHub repository.

TPC-DS 1TB Benchmark Results: Native Spark vs. Gluten + Velox Performance Analysis​

Interactive Performance Dashboard​

Performance Comparison: Runtime Analysis

Query Speedup Distribution

Top 10 Performance Improvements

🔍 Performance Analysis Insights

Summary​

Benchmark Infrastructure Configuration​

Test Environment Specifications​

Spark Configuration Comparison​

Critical Gluten-Specific Configurations​

Performance Analysis: Top 20 Query Improvements​

Speedup Distribution Across Queries​

Key Performance Insights​

Business Impact Assessment​

Cost Optimization Summary​

Operational Benefits​

Technical Recommendations​

When to Deploy Gluten + Velox​

Implementation Considerations​

Why a few queries regress?​

Architecture Overview — Apache Spark vs. Apache Spark with Gluten + Velox​

Execution Path Comparison​

Memory & Processing Comparison​

What Is Apache Gluten — Why It Matters​

Core Technical Benefits​

Gluten Plugin Architecture​

Key Configuration Parameters​

What Is Velox — Why Gluten Needs It (Alternatives)​

Velox Core Components​

Velox vs Alternative Backends​

Configuring Spark + Gluten + Velox​

Docker Image Configuration​

Spark Configuration Examples​

Running Benchmarks​

TPC-DS Benchmark Setup​

Step 1: Generate TPC-DS Data (1TB scale)​

Step 2: Submit Native & Gluten Jobs​

Step 3: Monitor Benchmark Progress​

Step 4: Spark History Server Analysis​

Step 5: Summarize Findings​

Key Metrics to Analyze​

Performance Analysis and Pitfalls​

Common Configuration Pitfalls​

Performance Optimization Tips​

Debugging Gluten Issues​

Verifying Gluten+Velox Execution in Spark History Server​

Conclusion​