Skip to main content

Valkey on EKS Stack

Valkey is an open-source, in-memory key/value datastore — a Linux Foundation–maintained fork of Redis 7.2.4 that continues the BSD-licensed lineage. Wire-compatible with Redis: same protocol, same client libraries (Lettuce, Jedis, ioredis, redis-py, go-redis), same data structures (strings, hashes, lists, sets, sorted sets, streams).

This data stack delivers a production-grade Valkey deployment on Amazon EKS using the official valkey-io/valkey-helm chart for replication mode and a local Helm chart for cluster mode (sharded with gossip), both fronted by a dedicated Karpenter NodePool of Graviton (r7g/r8g/r7gn/r8gn/m7gn) on-demand instances spread across three Availability Zones. The local cluster-mode chart will retire in favor of the official chart's native support once valkey-helm #18 ships.

Why this stack

  • Official chart only. No vendor-licensing risk. Chart and image versions pinned in Git.
  • Multi-AZ by default. Pods are hard-spread across AZs (whenUnsatisfiable: DoNotSchedule) on EBS gp3 PVCs with WaitForFirstConsumer binding.
  • ACL authentication. Two-user setup (default for applications, replication-user for inter-pod replication) with passwords sourced from a Terraform-generated kubernetes_secret.
  • Pod Identity for AWS access. The restore initContainer reads from the migration S3 bucket via the valkey-sa ServiceAccount associated with a least-privilege IAM role — no AWS keys in pod specs or Helm values.
  • Single Terraform variable. enable_valkey = true flips on the entire component; everything else lives in infra/terraform/helm-values/valkey.yaml.
🔁

Replication Cluster Verification

Confirm the 1-primary + N-replicas StatefulSet is healthy, run the smoke-test workload, and verify the read/write split end-to-end.

TopologyRead/Write SplitHA
🧩

Cluster Mode

3 primaries × 1 replica with hash-slot sharding, AZ-aware bootstrap, gossip-based failover. Local Helm chart with post-install bootstrap Job, until upstream cluster mode ships (valkey-helm #18).

ShardingGossipMulti-AZHelm
⬆️

Upgrades

Chart bumps, Valkey minor/patch upgrades, Karpenter AMI rollovers — all PDB-protected with rolling pod restarts and an ArgoCD-driven rollback path.

RunbookArgoCDRolling Update
📦

EC2 → EKS Migration

Move a self-managed Valkey or Redis instance from EC2 onto EKS via offline RDB snapshot through S3 and a restore initContainer.

RunbookMigrationS3

Topology Support Matrix

ModeArchitectureHAWrite scaleUse whenStatus (this stack)
Standalone1 podDev / test, ephemeral cacheAvailable — set replica.enabled: false
Primary + Replica1 primary + N replicasManual failoverRead-heavy, < 25 GB dataset, Lua scripts, multi-key opsDefault deployment (official chart)
SentinelPrimary + replicas + SentinelHA without shardingNot yet in official chart
Cluster Mode3+ primaries × 1+ replica each✓ (automatic via gossip)Large datasets, write throughputLocal Helm chart (examples/cluster-mode-helm-chart/) — see Cluster Mode guide. Switches to upstream chart when valkey-helm #18 lands.

Quick Reference

KnobDefaultNotes
Pods4 (1 primary + 3 replicas)Set via replica.replicas in helm-values/valkey.yaml
Nodes3+ (one per AZ)Karpenter provisions on demand from r7g/r8g family
Instance typer7g.large (sized for 12 GiB workload memory)Bump resources.requests.memory and let Karpenter pick a larger size for production
Storage50 GiB gp3 EBS PVC per podResizable; uses volumeBindingMode: WaitForFirstConsumer
Client port6379Application connects here
Metrics port9121Prometheus exporter sidecar (oliver006/redis_exporter)
Per-pod DNSvalkey-N.valkey-headless.valkey.svc.cluster.localStable across pod restarts
Endpointsvalkey (write) · valkey-read (replicas, read) · valkey-headless (per-pod)Two-Service read/write split