Load Balancer

Expose Kubeflow over Load Balancer on AWS

This tutorial shows how to expose Kubeflow over a load balancer on AWS.

Before you begin

Follow this guide only if you are not using Cognito as the authentication provider in your deployment. Cognito-integrated deployment is configured with the AWS Load Balancer controller by default to create an ingress-managed Application Load Balancer and exposes Kubeflow via a hosted domain.

Note: For Terraform deployment users, some steps that should be skipped will have a note indicating such below.

Background

Kubeflow does not offer a generic solution for connecting to Kubeflow over a Load Balancer because this process is highly dependent on your environment and cloud provider. On AWS, we use the AWS Load Balancer (ALB) controller, which satisfies the Kubernetes Ingress resource to create an Application Load Balancer (ALB). When you create a Kubernetes Ingress, an ALB is provisioned that load balances application traffic.

In order to connect to Kubeflow using a Load Balancer, we need to setup HTTPS. Many of the Kubeflow web apps (e.g. Tensorboard Web App, Jupyter Web App, Katib UI) use Secure Cookies, so accessing Kubeflow with HTTP over a non-localhost domain does not work.

To secure the traffic and use HTTPS, we must associate a Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificate with the Load Balancer. AWS Certificate Manager is a service that lets you easily provision, manage, and deploy public and private SSL/TLS certificates for use with AWS services and your internal connected resources. To create a certificate for use with the Load Balancer, you must specify a domain name (i.e. certificates cannot be created for ALB DNS). You can register your domain using any domain service provider such as Route53, or GoDaddy.

Prerequisites

This guide assumes that you have:

  • A Kubeflow deployment on EKS with Dex as your authentication provider (Dex is the default authentication provider in the Vanilla deployment of Kubeflow on AWS).
  • Installed the tools mentioned in the general prerequisites guide on the client machine.
  • Verified that you are connected to the right cluster, that the cluster has compute, and that the AWS region is set to the region of your cluster.
    • Verify that your cluster name and region are exported:
      echo $CLUSTER_REGION
      echo $CLUSTER_NAME
      
    • Display the current cluster that kubeconfig points to:
      kubectl config current-context
      aws eks describe-cluster --name $CLUSTER_NAME --region $CLUSTER_REGION
      
  • Verify that the current directory is the root of the repository by running the pwd command. The output should be <path/to/kubeflow-manifests>.

Create Load Balancer

Setup for Manifest deployments

If you prefer to create a load balancer using automated scripts, you only need to follow the steps in the automated script section. You can read the following sections in this guide to understand what happens when you run the automated script or to walk through all of the steps manually.

Setup for Terraform deployments

Follow the manual steps below.

Create domain and certificates

You need a registered domain and TLS certificate to use HTTPS with Load Balancer. Since your top level domain (e.g. example.com) can be registered at any service provider, for uniformity and taking advantage of the integration provided between Route53, ACM, and Application Load Balancer, you will create a separate sudomain (e.g. platform.example.com) to host Kubeflow and a corresponding hosted zone in Route53 to route traffic for this subdomain. To get TLS support, you will need certificates for both the root domain (*.example.com) and subdomain (*.platform.example.com) in the region where your platform will run (your EKS cluster region).

Create a subdomain

  1. Register a domain in any domain provider like Route 53 or GoDaddy. For this guide, we assume that this domain is example.com. It is handy to have a domain managed by Route53 to deal with all the DNS records that you will have to add (wildcard for ALB DNS, validation for the certificate manager, etc).
  2. Go to Route53 and create a subdomain to host Kubeflow:
    • Create a hosted zone for the desired subdomain (e.g. platform.example.com).

    • Copy the value of the NS type record from the subdomain hosted zone (platform.example.com) subdomain-NS

    • Create an NS type record in the root example.com hosted zone for the subdomain platform.example.com.

      root-domain-NS-creating-NS

      Verify the creation of your NS record in the Route53 console.

      root-domain-NS-created

From this point on, you create and update the DNS records only in the subdomain. All of the images of the hosted zone in the following steps of this guide are for the subdomain.

Create certificates for domain

To create the certificates for the domains in the region where your platform will run (i.e. EKS cluster region), follow the steps in the Request a public certificate using the console guide.

Note: The certificates are valid only after successful validation of domain ownership.

The following image is a screenshot showing that a certificate has been issued.

Note: Status turns to Issued after a few minutes of validation. successfully-issued-certificate

If you choose DNS validation for the validation of the certificates, you will be asked to create a CNAME type record in the hosted zone. The following image is a screenshot of the CNAME record of the certificate in the platform.example.com hosted zone for DNS validation: DNS-record-for-certificate-validation

  1. Create a certificate for *.example.com in the region where your platform will run.
  2. Create a certificate for *.platform.example.com in the region where your platform will run.

Configure Ingress

  1. Export the ARN of the certificate created for *.platform.example.com:
    export certArn=<>
    
  2. Configure the parameters for ingress with the certificate ARN of the subdomain.
    printf 'certArn='$certArn'' > awsconfigs/common/istio-ingress/overlays/https/params.env
    

Configure Load Balancer Controller

Important: Skip this step if you are using a Terraform deployment since the AWS Load Balancer Controller is installed by default unless you set enable_aws_load_balancer_controller = false.

Set up resources required for the Load Balancer controller:

  1. Make sure that all the subnets (public and private) corresponding to the EKS cluster are tagged according to the Prerequisites section in the Application load balancing on Amazon EKS guide. Ignore the requirement to have an existing ALB provisioned on the cluster. We will deploy Load Balancer controller version 1.1.5 later on.

    • Check if the following tags exist on the subnets:
      • kubernetes.io/cluster/cluster-name (replace cluster-name with your cluster name e.g. kubernetes.io/cluster/my-k8s-cluster). Add this tag in both private and public subnets. If you created the cluster using eksctl, you might be missing only this tag. Use the following command to tag all subnets by substituting the value of TAG_VALUE variable(owned or shared). Use shared as the tag value if you have more than one cluster using the subnets:
        export TAG_VALUE=<>
        export CLUSTER_SUBNET_IDS=$(aws ec2 describe-subnets --region $CLUSTER_REGION --filters Name=tag:alpha.eksctl.io/cluster-name,Values=$CLUSTER_NAME --output json | jq -r '.Subnets[].SubnetId')
        for i in "${CLUSTER_SUBNET_IDS[@]}"
        do
            aws ec2 create-tags --resources ${i} --tags Key=kubernetes.io/cluster/${CLUSTER_NAME},Value=${TAG_VALUE}
        done
        
      • kubernetes.io/role/internal-elb. Add this tag only to private subnets.
      • kubernetes.io/role/elb. Add this tag only to public subnets.
  2. The Load balancer controller uses IAM roles for service accounts(IRSA) to access AWS services. An OIDC provider must exist for your cluster to use IRSA. Create an OIDC provider and associate it with your EKS cluster by running the following command if your cluster doesn’t already have one:

    eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} --region ${CLUSTER_REGION} --approve
    
  3. Create an IAM role with the necessary permissions for the Load Balancer controller to use via a service account to access AWS services.

    export LBC_POLICY_NAME=alb_ingress_controller_${CLUSTER_REGION}_${CLUSTER_NAME}
    export LBC_POLICY_ARN=$(aws iam create-policy --policy-name $LBC_POLICY_NAME --policy-document file://awsconfigs/infra_configs/iam_alb_ingress_policy.json --output text --query 'Policy.Arn')
    eksctl create iamserviceaccount --name aws-load-balancer-controller --namespace kube-system --cluster ${CLUSTER_NAME} --region ${CLUSTER_REGION} --attach-policy-arn ${LBC_POLICY_ARN} --override-existing-serviceaccounts --approve
    
  4. Configure the parameters for load balancer controller with the cluster name.

    printf 'clusterName='$CLUSTER_NAME'' > awsconfigs/common/aws-alb-ingress-controller/base/params.env
    

Install Load Balancer Controller

Important: Skip this step if you are using a Terraform deployment since the AWS Load Balancer Controller is installed by default unless you set enable_aws_load_balancer_controller = false.

Run the following command to build and install the Load Balancer controller kustomize file.

kustomize build awsconfigs/common/aws-alb-ingress-controller/base | kubectl apply -f -
kubectl wait --for condition=established crd/ingressclassparams.elbv2.k8s.aws
kustomize build awsconfigs/common/aws-alb-ingress-controller/base | kubectl apply -f -

Create Ingress

Create an ingress that will use the certifcate you specified in certArn.

kustomize build awsconfigs/common/istio-ingress/overlays/https | kubectl apply -f -

Update the domain with ALB address

  1. Check if ALB is provisioned. This may take a few minutes.
        kubectl get ingress -n istio-system istio-ingress
        NAME            CLASS    HOSTS   ADDRESS                                                              PORTS   AGE
        istio-ingress   <none>   *       k8s-istiosys-istioing-xxxxxx-110050202.us-west-2.elb.amazonaws.com   80      15d
    
    If ADDRESS is empty after a few minutes, check the logs of the controller by following the troubleshooting steps in ALB fails to provision.
  2. When ALB is ready, copy the DNS name of that load balancer and create a CNAME entry to it in Route53 under the subdomain (platform.example.com) for *.platform.example.com. Please note that it might make up to five to ten minutes for DNS changes to propagate and for your URL to work. subdomain-*.platform-record

Note: Check if the DNS entry propogated with the Google Admin Toolbox.

  1. The central dashboard should now be available at https://kubeflow.platform.example.com. Open a browser and navigate to this URL.

Automated script

Important: Terraform deployment users should not follow these Automated setup instructions and should follow the Manual setup instructions.

  1. Install dependencies for the script
    cd tests/e2e
    pip install -r requirements.txt
    
  2. Substitute values in tests/e2e/utils/load_balancer/config.yaml.
    • Register root domain in route53.rootDomain.name. For this example, assume that this domain is example.com.
      • If your domain is managed in Route53, enter the Hosted zone ID found under Hosted zone details in route53.rootDomain.hostedZoneId. Skip this step if your domain is managed by other domain provider.
    • Name of the sudomain that you want to use to host Kubeflow (e.g. platform.example.com) in route53.subDomain.name.
    • Cluster name and region where Kubeflow is deployed in cluster.name and cluster.region (e.g. us-west-2), respectively.
    • Load balancer scheme (e.g. internet-facing or internal). Default is set to internet-facing. Use internal as the load balancer scheme if you want the load balancer to be accessible only within your VPC. See Load balancer scheme in the AWS documentation for more details.
    • The Config file will look something like:
      cluster:
          name: kube-eks-cluster
          region: us-west-2
      kubeflow:
          alb:
              scheme: internet-facing
      route53:
          rootDomain:
              hostedZoneId: XXXX
              name: example.com
          subDomain:
              name: platform.example.com
      
  3. Run the script to create the resources.
    PYTHONPATH=.. python utils/load_balancer/setup_load_balancer.py
    
  4. The script will update the Config file with the resource names, IDs, and ARNs that it created. Refer to the following example for more information:
    kubeflow:
        alb:
            dns: xxxxxx-istiosystem-istio-2af2-1100502020.us-west-2.elb.amazonaws.com
            scheme: internet-facing
            serviceAccount:
                name: alb-ingress-controller
                namespace: kubeflow
                policyArn: arn:aws:iam::123456789012:policy/alb_ingress_controller_kube-eks-clusterxxx
    cluster:
        name: kube-eks-cluster
        region: us-west-2
    route53:
        rootDomain:
            certARN: arn:aws:acm:us-west-2:123456789012:certificate/9d8c4bbc-3b02-4a48-8c7d-d91441c6e5af
            hostedZoneId: XXXXX
            name: example.com
        subDomain:
            certARN: arn:aws:acm:us-west-2:123456789012:certificate/d1d7b641c238-4bc7-f525-b7bf-373cc726
            hostedZoneId: XXXXX
            name: platform.example.com
    
  5. The central dashboard should now be available at https://kubeflow.platform.example.com. Open a browser and navigate to this URL.

Note: It might a few minutes for DNS changes to propagate and for your URL to work. Check if the DNS entry propogated with the Google Admin Toolbox

Clean up

Important: Terraform deployment users should not follow these clean up steps and should manually delete resources created while following the Manual setup instructions.

To delete the resources created in this guide, run the following commands from the root of your repository:

Note: Make sure that you have the configuration file created by the script in tests/e2e/utils/load_balancer/config.yaml. If you did not use the script, plug in the name, ARN, or ID of the resources that you created in the configuration file by referring to the sample in Step 4 of the previous section.

cd tests/e2e
PYTHONPATH=.. python utils/load_balancer/lb_resources_cleanup.py
cd -
Last modified September 1, 2023: v1.7.0-aws-b1.0.3 website changes (#791) (7faf1a5)