Skip to main content

infra

Deploying Apache Pinot (๐Ÿท) on EKSโ€‹

Apache Pinot is real-time distributed OLAP datastore, purpose built for low-latency and high-throughput analytics. You can use pinot to ingest and immediately query data from streaming or batch data sources e.g. Apache Kafka, Amazon Kinesis Data Streams, Amazon S3, etc).

Apache Pinot includes the following characteristics:

  • Ultra low-latency analytics even at extremely high throughput.
  • Columnar data store with several smart indexing and pre-aggregation techniques.
  • Scaling up and out with no upper bound.
  • Consistent performance based on the size of your cluster and an expected query per second (QPS) threshold.

It's a perfect solution for user-facing real-time analytics and other analytical use cases, including internal dashboards, anomaly detection, and ad hoc data exploration. You can learn more about Apache Pinot and its components in its documentation.

In this stack, we will deploy Apache Pinot on Kubernetes cluster managed by Elastic Kubernetes Service (EKS). Some of the benefits of deploying Apache Pinot on EKS cluster are

  • Manage Apache Pinot Cluster using Kubernetes
  • Scale each layer independently
  • No single point of failure
  • Auto recovery

Architectureโ€‹

In this setup we deploy all Apache Pinot components in private subnets across 2 availability zones. This allows for greater flexibility and resilience. The architecture uses:

  • Controllers & Brokers: m7g.2xlarge on-demand instances (8 vCPU, 32 GiB RAM, Graviton3)
  • Servers: r7g.2xlarge on-demand instances (8 vCPU, 64 GiB RAM, memory-optimized for data serving)
  • Minions: Spot instances when available (stateless background tasks)
  • Zookeeper: 3-node ensemble for cluster coordination (m7g instances with persistent storage)

Component Detailsโ€‹

Zookeeper Cluster

  • Deployed as a StatefulSet with 3 replicas for high availability
  • Manages cluster metadata, leader election, and distributed coordination
  • Uses persistent volumes (10Gi data + datalog) with pod anti-affinity to ensure pods run on different nodes
  • Configured with jute.maxbuffer=4000000 to handle large Pinot table schemas and segment metadata
note

We deploy Zookeeper using a custom StatefulSet manifest instead of the Bitnami Helm chart subchart, as the Bitnami Zookeeper chart has been deprecated. This approach gives us full control over the Zookeeper configuration and ensures long-term maintainability.

Storage Configuration

  • Server segments: 100Gi gp3 EBS volumes per server pod
  • Zookeeper: 10Gi gp3 EBS volumes for data and transaction logs
  • Storage class: gp3 (AWS default)
  • DeepStore: S3 bucket for segment backup and recovery (shared across data stacks)

Metrics & Monitoring

  • JMX Prometheus exporter enabled on port 8008 for all components
  • ServiceMonitor resources configured for Prometheus scraping (30s interval)
  • Metrics exposed for controllers, brokers, and servers

S3 DeepStore Integration

  • Segments are automatically backed up to S3 after creation
  • Enables faster recovery and pod replacement without rebuilding segments
  • Reduces dependency on local EBS storage
  • Configured at the infrastructure level (no per-table configuration needed)

Access to Controller and Broker components is via kubectl port-forwarding.

tip

To see Pinot in action with real-time data ingestion, check out the Kafka Integration example that demonstrates streaming data from Kafka into Pinot tables.

Note: Based on your use case, you will need to update the cluster size and configuration to better suite your use case. You can read more about Apache Pinot capacity planning here and here.

S3 DeepStore Configurationโ€‹

This deployment includes S3 DeepStore for segment backup and recovery. DeepStore provides:

  • Disaster Recovery: Segments persist in S3 even if all server pods are deleted
  • Faster Scaling: New servers download pre-built segments from S3 instead of rebuilding
  • Operational Flexibility: Safely delete/recreate pods without data loss

The infrastructure automatically configures:

  • Shared S3 bucket for all data stacks (with prefix-based organization)
  • IAM permissions via EKS Pod Identity
  • S3PinotFS plugin on controller and server components
  • Server-side upload to S3 when segments complete

Tables automatically use DeepStore without additional configuration. See the Kafka Integration example to see it in action.

Prerequisites ๐Ÿ“โ€‹

Ensure that you have following tools installed on your machine.

  1. aws cli
  2. kubectl
  3. terraform

Deployment โš™๏ธโ€‹

Deploy the EKS Cluster with Apache Pinotโ€‹

Step 1: Clone Repository & Navigateโ€‹

First, clone the repository. Navigate to apache pinot folder.

git clone https://github.com/awslabs/data-on-eks.git
cd data-on-eks/data-stacks/pinot-on-eks

Step 2: Customize Stackโ€‹

Edit the terraform/data-stack.tfvars file to customize settings if required. For example, you can open it with vi, nano, or any other text editor.

Step 3: Deploy Infrastructureโ€‹

Run the deployment script:

./deploy.sh
note

If deployment fails:

  • Rerun the same command: ./deploy.sh
  • If it still fails, debug using kubectl commands or raise an issue
info

Expected deployment time: 15-20 minutes

Verify Deploymentโ€‹

Verify the Amazon EKS Cluster

aws eks describe-cluster --name pinot-on-eks

Update local kubeconfig so we can access kubernetes cluster.

export KUBECONFIG=kubeconfig.yaml

Let's verify all the pods are running.

kubectl get pods -n pinot

Outputโ€‹

NAME                                                   READY   STATUS      RESTARTS   AGE
pinot-broker-0 1/1 Running 0 11d
pinot-broker-1 1/1 Running 0 11d
pinot-broker-2 1/1 Running 0 11d
pinot-controller-0 1/1 Running 0 11d
pinot-controller-1 1/1 Running 0 11d
pinot-controller-2 1/1 Running 0 11d
pinot-minion-stateless-86cf65f89-rlpwn 1/1 Running 0 12d
pinot-minion-stateless-86cf65f89-tkbjf 1/1 Running 0 12d
pinot-minion-stateless-86cf65f89-twp8n 1/1 Running 0 12d
pinot-server-0 1/1 Running 0 11d
pinot-server-1 1/1 Running 0 11d
pinot-server-2 1/1 Running 0 11d
pinot-zookeeper-0 1/1 Running 0 12d
pinot-zookeeper-1 1/1 Running 0 12d
pinot-zookeeper-2 1/1 Running 0 12d

We have also deployed prometheus and grafana under monitoring namespace. So also make sure all the pods for monitoring are also running.

kubectl get pods -n kube-prometheus-stack

Outputโ€‹

prometheus-grafana-85b4584dbf-4l72l                    3/3     Running   0          12d
prometheus-kube-prometheus-operator-84dcddccfc-pv8nv 1/1 Running 0 12d
prometheus-kube-state-metrics-57f6b6b4fd-txjtb 1/1 Running 0 12d
prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 4d3h
prometheus-prometheus-node-exporter-4jh8q 1/1 Running 0 12d
prometheus-prometheus-node-exporter-f5znb 1/1 Running 0 12d
prometheus-prometheus-node-exporter-f9xrz 1/1 Running 0 12d

Now lets access Apache Pinot Console using the below command. Console consist of Cluster Manager, Query Explorer, Zookeeper Browser and Swagger REST API Explorer.

kubectl port-forward service/pinot-controller 9000:9000 -n pinot

This will allow you to access Apache Pinot Console like the one shown below using http://localhost:9000

Apache Pinot Web Console

Apache Pinot supports exporting metrics using Prometheus JMX exporter that is packaged within the Apache Pinot docker image. ServiceMonitor resources are configured to scrape metrics from controller, broker, and server components on port 8008 every 30 seconds.

Let's verify metrics from all Apache Pinot components are getting published to Prometheus:

kubectl port-forward service/prometheus-kube-prometheus-prometheus 9090:9090 -n kube-prometheus-stack

Navigate to the prometheus UI at http://localhost:9090, type pinot in the search box and you should be able to see all the metrics.

Prometheus

Next, let's use Grafana to visualize the Apache Pinot metrics. In order to access Grafana, we need to get the grafana password from Kubernetes secrets:

kubectl get secret grafana-admin-secret -n kube-prometheus-stack -o jsonpath="{.data.admin-password}" | base64 --decode

Now use port-forwarding to access Grafana at port 8080:

kubectl port-forward service/prometheus-grafana 8080:80 -n kube-prometheus-stack

Login to grafana dashboard using admin and the password retrieved in the previous step. Navigate to Dashboard โ†’ New โ†’ Import.

Download the official Pinot dashboard JSON from the Apache Pinot documentation. This is a starter dashboard that you should customize for production use.

Grafana Dashboard for Pinot

To learn more about monitoring Apache Pinot using Prometheus and Grafana, see the official guide.

Next Stepsโ€‹

Cleanup ๐Ÿงนโ€‹

To delete all the components provisioned as part of this stack, using the following command to destroy all the resources.

./cleanup.sh
caution

To avoid unwanted charges to your AWS account, delete all the AWS resources created during this deployment.