Automated Deployment Guide
This guide describes how to deploy Kubeflow on AWS EKS using Cognito as identity provider. Kubeflow uses Istio to manage internal traffic. In this guide we will be creating an Ingress to manage external traffic to the Kubernetes services and an Application Load Balancer (ALB) to provide public DNS and enable TLS authentication at the load balancer. We will also be creating a custom domain to host Kubeflow since certificates (needed for TLS) for ALB’s public DNS names are not supported.
Prerequisites
This guide assumes you have Python 3.8 installed and that you have completed the prerequisites.
Create required resources and deploy Kubeflow
-
The following steps automate section 1.0(Custom domain and certificates) (creating a custom domain to host Kubeflow and TLS certificates for the domain), section 2.0(Cognito user pool) (creating a Cognito Userpool used for user authentication) andsection 3.0(Configure Ingress) (configuring ingress and load balancer controller manifests) of the cognito guide.
- Install dependencies for the scripts
pip install -r tests/e2e/requirements.txt
- Substitute values in
tests/e2e/utils/cognito_bootstrap/config.yaml
.- Registed root domain in
route53.rootDomain.name
. Lets assume this domain isexample.com
- If your domain is managed in route53, enter the Hosted zone ID found under Hosted zone details in
route53.rootDomain.hostedZoneId
. Skip this step if your domain is managed by other domain provider.
- If your domain is managed in route53, enter the Hosted zone ID found under Hosted zone details in
- Name of the sudomain you want to host Kubeflow (e.g.
platform.example.com
) inroute53.subDomain.name
. Please read this section to understand why we use a subdomain. - Cluster name and region where kubeflow will be deployed in
cluster.name
andcluster.region
(e.g. us-west-2) respectively. - Name of cognito userpool in
cognitoUserpool.name
e.g. kubeflow-users. - The config file will look something like:
-
cognitoUserpool: name: kubeflow-users cluster: name: kube-eks-cluster region: us-west-2 route53: rootDomain: hostedZoneId: XXXX name: example.com subDomain: name: platform.example.com
-
- Registed root domain in
- Run the script to create the resources
-
cd tests/e2e PYTHONPATH=.. python utils/cognito_bootstrap/cognito_pre_deployment.py cd -
-
- The script will update the config file with the resource names/ids/ARNs it created. It will look something like:
-
cognitoUserpool: ARN: arn:aws:cognito-idp:us-west-2:123456789012:userpool/us-west-2_yasI9dbxF appClientId: 5jmk7ljl2a74jk3n0a0fvj3l31 domainAliasTarget: xxxxxxxxxx.cloudfront.net domain: auth.platform.example.com name: kubeflow-users kubeflow: alb: serviceAccount: name: alb-ingress-controller namespace: kubeflow policyArn: arn:aws:iam::123456789012:policy/alb_ingress_controller_kube-eks-clusterxxx cluster: name: kube-eks-cluster region: us-west-2 route53: rootDomain: certARN: arn:aws:acm:us-east-1:123456789012:certificate/9d8c4bbc-3b02-4a48-8c7d-d91441c6e5af hostedZoneId: XXXXX name: example.com subDomain: us-west-2-certARN: arn:aws:acm:us-west-2:123456789012:certificate/d1d7b641c238-4bc7-f525-b7bf-373cc726 hostedZoneId: XXXXX name: platform.example.com us-east-1-certARN: arn:aws:acm:us-east-1:123456789012:certificate/373cc726-f525-4bc7-b7bf-d1d7b641c238
-
- Install dependencies for the scripts
-
Install Kubeflow using the following command:
while ! kustomize build deployments/cognito | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 30; done
-
Updating the domain with ALB address
- Check if ALB is provisioned. It takes around 3-5 minutes
-
kubectl get ingress -n istio-system Warning: extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress NAME CLASS HOSTS ADDRESS PORTS AGE istio-ingress <none> * k8s-istiosys-istioing-xxxxxx-110050202.us-west-2.elb.amazonaws.com 80 15d
- If
ADDRESS
is empty after a few minutes, check the logs of alb-ingress-controller by following this guide
-
- Substitute the ALB address under
kubeflow.alb.dns
intests/e2e/utils/cognito_bootstrap/config.yaml
. The kubeflow section of the config file will look like:-
kubeflow: alb: dns: ebde55ee-istiosystem-istio-2af2-1100502020.us-west-2.elb.amazonaws.com serviceAccount: name: alb-ingress-controller policyArn: arn:aws:iam::123456789012:policy/alb_ingress_controller_kube-eks-clusterxxx
-
- Run the following script to update the subdomain with ALB address
-
cd tests/e2e PYTHONPATH=.. python utils/cognito_bootstrap/cognito_post_deployment.py cd -
-
- Check if ALB is provisioned. It takes around 3-5 minutes
-
Follow the rest of the cognito guide from section 6.0(Connecting to central dashboard) to:
- Create a user in Cognito user pool
- Create a profile for the user from the user pool
- Connect to the central dashboard
Uninstall Kubeflow
Note: Delete all the resources you might have created in your profile namespaces before running these steps.
- Run the following commands to delete the profiles, ingress and corresponding ingress managed load balancer
kubectl delete profiles --all kubectl delete ingress -n istio-system istio-ingress
- Delete the kubeflow deployment
kustomize build deployments/cognito | kubectl delete -f -
- To delete the rest of resources(subdomain, certificates etc.), run the following commands from the root of your repository:
Note: Make sure that you have the configuration file created by the script in
tests/e2e/utils/cognito_bootstrap/config.yaml
. If you did not use the script, plug in the name, ARN, or ID of the resources that you created in a yaml file intests/e2e/utils/cognito_bootstrap/config.yaml
by referring to the following sample:- Sample config file:
cognitoUserpool: ARN: arn:aws:cognito-idp:us-west-2:123456789012:userpool/us-west-2_yasI9dbxF appClientId: 5jmk7ljl2a74jk3n0a0fvj3l31 domainAliasTarget: xxxxxxxxxx.cloudfront.net domain: auth.platform.example.com name: kubeflow-users kubeflow: alb: serviceAccount: name: alb-ingress-controller namespace: kubeflow policyArn: arn:aws:iam::123456789012:policy/alb_ingress_controller_kube-eks-clusterxxx cluster: name: kube-eks-cluster region: us-west-2 route53: rootDomain: certARN: arn:aws:acm:us-east-1:123456789012:certificate/9d8c4bbc-3b02-4a48-8c7d-d91441c6e5af hostedZoneId: XXXXX name: example.com subDomain: us-west-2-certARN: arn:aws:acm:us-west-2:123456789012:certificate/d1d7b641c238-4bc7-f525-b7bf-373cc726 hostedZoneId: XXXXX name: platform.example.com us-east-1-certARN: arn:aws:acm:us-east-1:123456789012:certificate/373cc726-f525-4bc7-b7bf-d1d7b641c238
- Run the following command to install the script dependencies and delete the resources:
Note: You can rerun the script incase some resources fail to delete
cd tests/e2e pip install -r requirements.txt PYTHONPATH=.. python utils/cognito_bootstrap/cognito_resources_cleanup.py cd -