Deploying PostgreSQL Database on EKS using CloudNativePG Operator
Introduction
CloudNativePG is an open source operator designed to manage PostgreSQL workloads Kubernetes.
It defines a new Kubernetes resource called Cluster representing a PostgreSQL
cluster made up of a single primary and an optional number of replicas that co-exist
in a chosen Kubernetes namespace for High Availability and offloading of
read-only queries.
Applications that reside in the same Kubernetes cluster can access the PostgreSQL database using a service which is solely managed by the operator, without having to worry about changes of the primary role following a failover or a switchover. Applications that reside outside the Kubernetes cluster, need to configure a Service or Ingress object to expose the Postgres via TCP. Web applications can take advantage of the native connection pooler based on PgBouncer.
CloudNativePG was originally built by EDB, then released open source under Apache License 2.0 and submitted for CNCF Sandbox in April 2022. The source code repository is in Github.
More details about the project will be found on this link
Deploying the Solution
Let's go through the deployment steps
Prerequisites
Ensure that you have installed the following tools on your machine.
Deploy the EKS Cluster with CloudNativePG Operator
First, clone the repository
git clone https://github.com/awslabs/data-on-eks.git
Navigate into cloudnative-postgres folder and run install.sh script. By default the script deploys EKS cluster to us-west-2 region. Update variables.tf to change the region. This is also the time to update any other input variables or make any other changes to the terraform template.
cd data-on-eks/distributed-databases/cloudnative-postgres
./install.sh
Verify Deployment
Verify the Amazon EKS Cluster
aws eks describe-cluster --name cnpg
Update local kubeconfig so we can access kubernetes cluster
aws eks update-kubeconfig --name cnpg --region us-west-2
First, lets verify that we have worker nodes running in the cluster.
kubectl get nodes
NAME                                        STATUS   ROLES    AGE   VERSION
ip-10-1-10-68.us-west-2.compute.internal    Ready    <none>   94m   v1.32.8-eks-99d6cc0
ip-10-1-11-124.us-west-2.compute.internal   Ready    <none>   97m   v1.32.8-eks-99d6cc0
ip-10-1-11-187.us-west-2.compute.internal   Ready    <none>   97m   v1.32.8-eks-99d6cc0
ip-10-1-12-158.us-west-2.compute.internal   Ready    <none>   97m   v1.32.8-eks-99d6cc0
Next, lets verify all the pods are running.
kubectl get pods --namespace=monitoring
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          94m
prometheus-grafana-679d5bbf76-mvtp5                      3/3     Running   0          91m
prometheus-kube-prometheus-operator-579b8cf467-h7fwm     1/1     Running   0          83m
prometheus-kube-state-metrics-6d476dd454-p52mr           1/1     Running   0          94m
prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0          80m
kubectl get pods --namespace=cnpg-system
NAME                                   READY   STATUS    RESTARTS   AGE
cnpg-cloudnative-pg-85949d9bc8-dp8vq   1/1     Running   0          83m
Deploy a PostgreSQL cluster
First of all, we need to create a storageclass using the ebs-csi-driver, a demo namespace and kubernetes secrets for login/password for database authentication app-auth. Check examples folder for all kubernetes manifests.
Storage
For running a highly scalable and durable self-managed PostgreSQL database on Kubernetes with Amazon EKS and EC2, it is recommended to use Amazon Elastic Block Store (EBS) volumes that provide high performance and fault tolerance. The preferred EBS volume types for this use case are:
1.Provisioned IOPS SSD (io2 or io1):
- Designed for I/O-intensive workloads such as databases.
- Offers consistent and low-latency performance.
- Allows you to provision a specific number of IOPS (input/output operations per second) according to your requirements.
- Provides up to 64,000 IOPS per volume and 1,000 MB/s throughput, making it suitable for demanding database workloads.
2.General Purpose SSD (gp3 or gp2):
- Suitable for most workloads and offers a balance between performance and cost.
- Provides a baseline performance of 3,000 IOPS and 125 MB/s throughput per volume, which can be increased if needed (up to 16,000 IOPS and 1,000 MB/s for gp3).
- Recommended for less I/O-intensive database workloads or when cost is a primary concern.
You can find both storageclass template in examples folder.
kubectl create -f examples/storageclass.yaml
kubectl create -f examples/auth-prod.yaml
As with any other deployment in Kubernetes, to deploy a PostgreSQL cluster you need to apply a configuration file that defines your desired Cluster. CloudNativePG operator offers two type of Bootstrapping a new database:
- Bootstrap an empty cluster
- Bootstrap From another cluster.
In this first example, we are going to create a new empty database cluster using initdb flags. We are going to use the template below by modifying the IAM role for IRSA configuration 1 and S3 bucket for backup restore process and WAL archiving 2. The Terraform could already created this use terraform output to extract these parameters:
cd data-on-eks/distributed-databases/cloudnative-postgres
terraform output
barman_backup_irsa = "arn:aws:iam::<your_account_id>:role/cnpg-prod-irsa"
barman_s3_bucket = "XXXX-cnpg-barman-bucket"
configure_kubectl = "aws eks --region us-west-2 update-kubeconfig --name cnpg"
---
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: prod
  namespace: demo
spec:
  description: "Cluster Demo for DoEKS"
  # Choose your PostGres Database Version
  imageName: ghcr.io/cloudnative-pg/postgresql:17.2
  # Number of Replicas
  instances: 3
  startDelay: 300
  stopDelay: 300
  replicationSlots:
    highAvailability:
      enabled: true
    updateInterval: 300
  primaryUpdateStrategy: unsupervised
  serviceAccountTemplate:
    # For backup and restore, we use IRSA for barman tool.
    # You will find this IAM role on terraform outputs.
    metadata:
      annotations:
        eks.amazonaws.com/role-arn: arn:aws:iam::<<account_id>>:role/cnpg-on-eks-prod-irsa #1
  postgresql:
    parameters:
      shared_buffers: 256MB
      pg_stat_statements.max: '10000'
      pg_stat_statements.track: all
      auto_explain.log_min_duration: '10s'
    pg_hba:
      # - hostssl app all all cert
      - host app app all password
  logLevel: debug
  # Choose the right storageclass for type of workload.
  storage:
    storageClass: storageclass-io2 # change this if you want to use a different storage class (ex: storageclass-gp3)
    size: 4Gi
  walStorage:
    storageClass: storageclass-io2 # change this if you want to use a different storage class (ex: storageclass-gp3)
    size: 4Gi
  monitoring:
    enablePodMonitor: true
  bootstrap:
    initdb: # Deploying a new cluster
      database: WorldDB
      owner: app
      secret:
        name: app-auth
  backup:
    barmanObjectStore:
    # For backup, we S3 bucket to store data.
    # On this Blueprint, we create an S3 check the terraform output for it.
      destinationPath: s3://<your-s3-barman-bucket> # ie: s3://xxxx-cnpg-barman-bucket, #2
      s3Credentials:
        inheritFromIAMRole: true
      wal:
        compression: gzip
        maxParallel: 8
    retentionPolicy: "30d"
  resources:
    requests:
      memory: "512Mi"
      cpu: "1"
    limits:
      memory: "1Gi"
      cpu: "2"
  affinity:
    enablePodAntiAffinity: true
    topologyKey: failure-domain.beta.kubernetes.io/zone
  nodeMaintenanceWindow:
    inProgress: false
    reusePVC: false
Once updated, you can apply your template.
kubectl create -f examples/cluster-prod.yaml
Verify that CloudNatvicePG operator has created three pods: one primary and two standby.
kubectl get pods,svc -n demo
NAME         READY   STATUS    RESTARTS   AGE
pod/prod-1   1/1     Running   0          4m36s
pod/prod-2   1/1     Running   0          3m45s
pod/prod-3   1/1     Running   0          3m9s
NAME               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/prod-any   ClusterIP   172.20.230.153   <none>        5432/TCP   4m54s
service/prod-r     ClusterIP   172.20.33.61     <none>        5432/TCP   4m54s
service/prod-ro    ClusterIP   172.20.96.16     <none>        5432/TCP   4m53s
service/prod-rw    ClusterIP   172.20.236.1     <none>        5432/TCP   4m53s
The operator created also three services:
- -rw: points only to the primary instances of cluster database
- -ropoints only to hot standby replicas for read-only-workloads
- -rpoints to any of the instances for read-only workloads
Note that -any points on all the instances.
Another way to check Cluster status is by using cloudnative-pg kubectl plugin offered by the CloudNativePG community,
kubectl cnpg status prod -n demo
Cluster Summary
Name:               prod
Namespace:          demo
System ID:          7214866198623563798
PostgreSQL Image:   ghcr.io/cloudnative-pg/postgresql:15.2
Primary instance:   prod-1
Status:             Cluster in healthy state
Instances:          3
Ready instances:    3
Current Write LSN:  0/6000000 (Timeline: 1 - WAL File: 000000010000000000000005)
Certificates Status
Certificate Name  Expiration Date                Days Left Until Expiration
----------------  ---------------                --------------------------
prod-ca           2023-06-24 14:40:27 +0000 UTC  89.96
prod-replication  2023-06-24 14:40:27 +0000 UTC  89.96
prod-server       2023-06-24 14:40:27 +0000 UTC  89.96
Continuous Backup status
First Point of Recoverability:  Not Available
Working WAL archiving:          OK
WALs waiting to be archived:    0
Last Archived WAL:              000000010000000000000005   @   2023-03-26T14:52:09.24307Z
Last Failed WAL:                -
Streaming Replication status
Replication Slots Enabled
Name    Sent LSN   Write LSN  Flush LSN  Replay LSN  Write Lag  Flush Lag  Replay Lag  State      Sync State  Sync Priority  Replication Slot
----    --------   ---------  ---------  ----------  ---------  ---------  ----------  -----      ----------  -------------  ----------------
prod-2  0/6000000  0/6000000  0/6000000  0/6000000   00:00:00   00:00:00   00:00:00    streaming  async       0              active
prod-3  0/6000000  0/6000000  0/6000000  0/6000000   00:00:00   00:00:00   00:00:00    streaming  async       0              active
Unmanaged Replication Slot Status
No unmanaged replication slots found
Instances status
Name    Database Size  Current LSN  Replication role  Status  QoS         Manager Version  Node
----    -------------  -----------  ----------------  ------  ---         ---------------  ----
prod-1  29 MB          0/6000000    Primary           OK      BestEffort  1.19.0           ip-10-1-10-192.us-west-2.compute.internal
prod-2  29 MB          0/6000000    Standby (async)   OK      BestEffort  1.19.0           ip-10-1-12-195.us-west-2.compute.internal
prod-3  29 MB          0/6000000    Standby (async)   OK      BestEffort  1.19.0           ip-10-1-11-38.us-west-2.compute.internal
Monitoring
In this example, we deployed a Prometheus and Grafana addons to monitor all database clusters created by CloudNativePG. Let's check Grafana dashboard.
kubectl -n monitoring port-forward svc/prometheus-grafana 8080:80
You can get the username and password by running the following command:
kubectl get secret prometheus-grafana -n monitoring -o jsonpath="{.data.admin-user}" | base64 --decode ; echo
kubectl get secret prometheus-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
Now, you can go to http://localhost:8080 to log into the Grafana dashboard.

Import database sample
You can expose your database outside the cluster using ingress-controller or kubernetes service type LoadBalancer. However, for internal usage inside your EKS cluster, you can use kubernetes service prod-rw and prod-ro.
In this section, we are going to expose read-write service -rwusing kubectl port-forward.
kubectl port-forward svc/prod-rw 5432:5432 -n demo
First, let's get the app-auth secret.
kubectl get secret app-auth -n demo -o=jsonpath='{.data.password}' | base64 --decode ; echo
Now, we use psql cli to import world.sql into our database instance WorldDB using credentials from app-auth secrets.
psql -h localhost --port 5432 -U app -d WorldDB < world.sql
# Quick check on db tables.
psql -h localhost --port 5432 -U app -d WorldDB -c '\dt'
Password for user app:
            List of relations
 Schema |      Name       | Type  | Owner
--------+-----------------+-------+-------
 public | city            | table | app
 public | country         | table | app
 public | countrylanguage | table | app
(3 rows)
Create Backup to S3
Now that we had a running database with data, CloudNativePG operator offers backup-restore feature using barman tool. CloudNativePG allows database admin to create on-demand database or Scheduled backups and for more details on documentations.
In this example, we will create a Backup object to start a backup process immediately.
apiVersion: postgresql.cnpg.io/v1
kind: Backup
metadata:
  name: ondemand
spec:
  cluster:
    name: prod
 kubectl create -f examples/backup-od.yaml
It will take couple minutes to run, then, check the backup process
kubectl describe backup ondemand
Events:
  Type    Reason     Age   From                   Message
  ----    ------     ----  ----                   -------
  Normal  Starting   60s   cloudnative-pg-backup  Starting backup for cluster prod
  Normal  Starting   60s   instance-manager       Backup started
  Normal  Completed  56s   instance-manager       Backup completed
Restore
For restore, we use bootstrap a new cluster using backup file on S3. The backup tool barman manages restore process, but, it doesn't support backup and restore for kubernetes secrets. This must be managed separately, like using csi-secrets-driver with AWS SecretsManager.
First let's delete prod database.
kubectl delete cluster prod -n demo
Then, update your template examples/cluster-restore.yaml with your S3 bucket and IAM role. Note that on restore template, CloudNativePG use externalClusters to point on the database.
  kubectl create -f examples/cluster-restore.yaml
  Type    Reason                       Age    From            Message
  ----    ------                       ----   ----            -------
  Normal  CreatingPodDisruptionBudget  7m12s  cloudnative-pg  Creating PodDisruptionBudget prod-primary
  Normal  CreatingPodDisruptionBudget  7m12s  cloudnative-pg  Creating PodDisruptionBudget prod
  Normal  CreatingServiceAccount       7m12s  cloudnative-pg  Creating ServiceAccount
  Normal  CreatingRole                 7m12s  cloudnative-pg  Creating Cluster Role
  Normal  CreatingInstance             7m12s  cloudnative-pg  Primary instance (from backup)
  Normal  CreatingInstance             6m33s  cloudnative-pg  Creating instance prod-2
  Normal  CreatingInstance             5m51s  cloudnative-pg  Creating instance prod-3
When creating a new cluster, the operator will create a ServiceAccount with IRSA configuration as described on Cluster resources. Make sure the trust policy points the right ServiceAccount.
Let's check if the data were covered as expected.
psql -h localhost --port 5432 -U app -d WorldDB -c '\dt'
Password for user app:
            List of relations
 Schema |      Name       | Type  | Owner
--------+-----------------+-------+-------
 public | city            | table | app
 public | country         | table | app
 public | countrylanguage | table | app
(3 rows)
psql -h localhost --port 5432 -U app -d WorldDB -c 'SELECT CURRENT_TIME;'
Conclusion
CloudNativePG operator provides Level 5 from Operator Capability Levels. In this example, we share a blueprint that deploy the operator as an addon along with its monitoring stack (Prometheus and grafana). Among many features, we highlighted couple of examples on creating cluster, importing data and restoring database in case of disaster (or cluster deletion). More features are available on this documentation