Valkey on EKS Stack
Valkey is an open-source, in-memory key/value datastore — a Linux Foundation–maintained fork of Redis 7.2.4 that continues the BSD-licensed lineage. Wire-compatible with Redis: same protocol, same client libraries (Lettuce, Jedis, ioredis, redis-py, go-redis), same data structures (strings, hashes, lists, sets, sorted sets, streams).
This data stack delivers a production-grade Valkey deployment on Amazon EKS using the official valkey-io/valkey-helm chart for replication mode and a local Helm chart for cluster mode (sharded with gossip), both fronted by a dedicated Karpenter NodePool of Graviton (r7g/r8g/r7gn/r8gn/m7gn) on-demand instances spread across three Availability Zones. The local cluster-mode chart will retire in favor of the official chart's native support once valkey-helm #18 ships.
Why this stack
- Official chart only. No vendor-licensing risk. Chart and image versions pinned in Git.
- Multi-AZ by default. Pods are hard-spread across AZs (
whenUnsatisfiable: DoNotSchedule) on EBS gp3 PVCs withWaitForFirstConsumerbinding. - ACL authentication. Two-user setup (
defaultfor applications,replication-userfor inter-pod replication) with passwords sourced from a Terraform-generatedkubernetes_secret. - Pod Identity for AWS access. The restore initContainer reads from the migration S3 bucket via the
valkey-saServiceAccount associated with a least-privilege IAM role — no AWS keys in pod specs or Helm values. - Single Terraform variable.
enable_valkey = trueflips on the entire component; everything else lives ininfra/terraform/helm-values/valkey.yaml.
Infrastructure Deployment
EKS, VPC across 3 AZs, Graviton Karpenter NodePool, ArgoCD, and the official Valkey Helm release. End-to-end via ./deploy.sh.
Replication Cluster Verification
Confirm the 1-primary + N-replicas StatefulSet is healthy, run the smoke-test workload, and verify the read/write split end-to-end.
Cluster Mode
3 primaries × 1 replica with hash-slot sharding, AZ-aware bootstrap, gossip-based failover. Local Helm chart with post-install bootstrap Job, until upstream cluster mode ships (valkey-helm #18).
Upgrades
Chart bumps, Valkey minor/patch upgrades, Karpenter AMI rollovers — all PDB-protected with rolling pod restarts and an ArgoCD-driven rollback path.
EC2 → EKS Migration
Move a self-managed Valkey or Redis instance from EC2 onto EKS via offline RDB snapshot through S3 and a restore initContainer.
Topology Support Matrix
| Mode | Architecture | HA | Write scale | Use when | Status (this stack) |
|---|---|---|---|---|---|
| Standalone | 1 pod | ✗ | ✗ | Dev / test, ephemeral cache | Available — set replica.enabled: false |
| Primary + Replica | 1 primary + N replicas | Manual failover | ✗ | Read-heavy, < 25 GB dataset, Lua scripts, multi-key ops | Default deployment (official chart) |
| Sentinel | Primary + replicas + Sentinel | ✓ | ✗ | HA without sharding | Not yet in official chart |
| Cluster Mode | 3+ primaries × 1+ replica each | ✓ (automatic via gossip) | ✓ | Large datasets, write throughput | Local Helm chart (examples/cluster-mode-helm-chart/) — see Cluster Mode guide. Switches to upstream chart when valkey-helm #18 lands. |
Quick Reference
| Knob | Default | Notes |
|---|---|---|
| Pods | 4 (1 primary + 3 replicas) | Set via replica.replicas in helm-values/valkey.yaml |
| Nodes | 3+ (one per AZ) | Karpenter provisions on demand from r7g/r8g family |
| Instance type | r7g.large (sized for 12 GiB workload memory) | Bump resources.requests.memory and let Karpenter pick a larger size for production |
| Storage | 50 GiB gp3 EBS PVC per pod | Resizable; uses volumeBindingMode: WaitForFirstConsumer |
| Client port | 6379 | Application connects here |
| Metrics port | 9121 | Prometheus exporter sidecar (oliver006/redis_exporter) |
| Per-pod DNS | valkey-N.valkey-headless.valkey.svc.cluster.local | Stable across pod restarts |
| Endpoints | valkey (write) · valkey-read (replicas, read) · valkey-headless (per-pod) | Two-Service read/write split |