KServe with Kubeflow on AWS

Serve prediction requests using Knative Serving and AWS Load Balancer

This tutorial shows how to set up a load balancer endpoint for serving prediction requests over an external DNS on AWS.

Note: The KFServing project is now called KServe. To migrate from KFServing to Kserve follow this guide.

Read the background section of the Load Balancer installation guide to familiarize yourself with the requirements for creating an Application Load Balancer on AWS.


This guide assumes that you have:

  1. The necessary prerequisites, including a Kubeflow deployment.
  2. The AWS Load Balancer controller configured with one of the following deployment options:
  3. A subdomain for hosting Kubeflow. For this guide, we will use the domain platform.example.com.
  4. An existing profile namespace for a user in Kubeflow. For this guide, we will use the example profile namespace staging.
  5. Verified that your current directory is the root of the repository by running the pwd command. The output should be <path/to/kubeflow-manifests> directory.

Configure a default domain with Knative Serving

Use Knative Serving to set up network routing resources.

The default fully qualified domain name (FQDN) for a route in Knative Serving is {route}.{namespace}.{default-domain}. Knative Serving routes use example.com as the default domain. If you create an InferenceService resource called sklearn-iris in the staging namespace without changing the default domain, the resulting InfererenceService domain would be http://sklearn-iris.staging.example.com.

To host an InferenceService on the same domain that you use to host Kubeflow (for example, platform.example.com), then you will need to edit the config-domain ConfigMap in the knative-serving namespace to configure platform.example.com to be used as the domain for the routes.

Edit the ConfigMap to change the default domain as per your deployment. Remove the _example key and replace example.com with your domain (e.g. platform.example.com).

apiVersion: v1
kind: ConfigMap
  platform.example.com: ""

For more detailed instructions, see the KNative Serving Changing the default domain procedure.

Request a certificate

Request a certificate in AWS Certificate Manager (ACM) to get TLS support from the Load Balancer.

Certificate request background

Knative concatenates the namespace in the FQDN for a route and the domain is delimited by a dot by default. The URLs for InferenceService resources created in each namespace will be in a different subdomain.

  • For example, if you have two namespaces, staging and prod, and create an InferenceService resource called sklearn-iris in both of these namespaces, then the URLs for each resource will be http://sklearn-iris.staging.platform.example.com and http://sklearn-iris.prod.platform.example.com, respectively.

This means that you need to specify all subdomains in which you plan to create an InferenceService resource while creating the SSL certificate in ACM.

  • For example, for staging and prod namespaces, you will need to add *.prod.platform.example.com, *.staging.platform.example.com and *.platform.example.com to the certificate.

DNS only supports wildcard placeholders in the leftmost part of the domain name. When you request a wildcard certificate using ACM, the asterisk (*) must be in the leftmost position of the domain name and can protect only one subdomain level.

  • For example, *.platform.example.com can protect staging.platform.example.com, and prod.platform.example.com, but it cannot protect sklearn-iris.staging.platform.example.com.

Create a certificate

Note: Both of these domains should be requested in the same certificate

Create an ACM certificate for *.platform.example.com and *.staging.platform.example.com in your cluster’s region by following the create certificates for domain steps in the Load Balancer installation guide.

Once the certificate status changes to Issued, export the ARN of the certificate created:

export certArn=<>

If you are using Cognito for user authentication, see Cognito. If you use Dex as the auth provider in your Kubeflow deployment, see Dex.

Cognito ingress

It is not currently possible to programatically authenticate a request that uses Amazon Cognito for user authentication through Load Balancer. You cannot generate AWSELBAuthSessionCookie cookies by using the access tokens from Cognito.

To work around this, it is necessary to create a new Load Balancer endpoint for serving traffic that authorizes based on custom strings specified in a predefined HTTP header.

Use an ingress to set the HTTP header conditions on your Load Balancer. This creates rules that route requests based on HTTP headers. This can be used for service-to-service communication in your application.

Create ingress

  1. Configure the following parameters for ingress:
    • certArn: ARN of certificate created during Request a certificate step.
    • (optional) httpHeaderName: Custom HTTP header name that you want to configure for the rule evaluation. Defaults to x-api-key.
    • httpHeaderValues: One or more match strings that need to be compared against the header value if the request received. You only need to pass one of the tokens in the request. Pick strong values.

Note: The httpHeaderName and httpHeaderValues values correspond to the HttpHeaderConfig values

  1. Replace the token1 string with a token of your choice. Optionally, replace the httpHeaderName string as well.
    printf '
    ' > awsconfigs/common/istio-ingress/overlays/api/params.env
  2. Create the ingress with the following command:
    kustomize build awsconfigs/common/istio-ingress/overlays/api | kubectl apply -f -
  3. Check if the ingress-managed Load Balancer is provisioned. This may take a few minutes to complete.
    kubectl get ingress -n istio-system istio-ingress-api
    NAME                CLASS    HOSTS   ADDRESS                                                              PORTS   AGE
    istio-ingress-api   <none>   *       k8s-istiosys-istioing-xxxxxx-110050202.us-west-2.elb.amazonaws.com   80      14m

Once your Load Balancer is ready, move on to the Add DNS records step to add a DNS record for the staging subdomain.

Dex ingress

Update the certificate for your Load Balancer

  1. Configure the parameters for ingress with the ARN of the certificate created during the Request a certificate step.
    printf 'certArn='$certArn'' > awsconfigs/common/istio-ingress/overlays/https/params.env
  2. Update the Load Balancer with the following command:
    kustomize build awsconfigs/common/istio-ingress/overlays/https | kubectl apply -f -
  3. Get the Load Balancer address
    kubectl get ingress -n istio-system istio-ingress
    NAME            CLASS    HOSTS   ADDRESS                                                              PORTS   AGE
    istio-ingress   <none>   *       k8s-istiosys-istioing-xxxxxx-110050202.us-west-2.elb.amazonaws.com   80      15d

Once your Load Balancer is ready, move on to the Add DNS records step to add a DNS record for the staging subdomain.

Add DNS records

Once your ingress-managed Load Balancer is ready, copy the ADDRESS of that Load Balancer and create a CNAME entry to it in Amazon Route 53 under your subdomain (e.g. platform.example.com) for *.staging.platform.example.com.

Run a sample InferenceService

Create an AuthorizationPolicy

Namespaces created by the Kubeflow profile controller have a missing authorization policy that prevents the KServe predictor and transformer from working.

Known Issue: See kserve/kserve#1558 and kubeflow/kubeflow#5965 for more information.

Create the AuthorizationPolicy as mentioned in issue #82 as a workaround until this is resolved. Verify that the policies have been created by listing the authorizationpolicies in the istio-system namespace:

kubectl get authorizationpolicies -n istio-system

Create an InferenceService

Set the environment variable value for PROFILE_NAMESPACE(e.g. staging) according to your environment:

export PROFILE_NAMESPACE="staging"

Create a scikit-learn InferenceService using a sample from the KServe repository and wait for READY to be True.

kubectl apply -n ${PROFILE_NAMESPACE} -f https://raw.githubusercontent.com/kserve/kserve/release-0.8/docs/samples/v1beta1/sklearn/v1/sklearn.yaml

Check InferenceService status

Check the InferenceService status. Once it is ready, copy the URL to use for sending a prediction request.

kubectl get inferenceservices sklearn-iris -n ${PROFILE_NAMESPACE}

NAME             URL                                                 READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                      AGE
sklearn-iris   http://sklearn-iris.staging.platform.example.com   True           100                              sklearn-iris-predictor-default-00001   3m31s

Send an inference request

Set the environment variable values for KUBEFLOW_DOMAIN(e.g. platform.example.com) according to your environment:

export KUBEFLOW_DOMAIN="platform.example.com"

Install dependencies for the script by running:

cd tests/e2e
pip install -r requirements.txt

Run the sample python script to send an inference request based on your auth provider:

Cognito inference

Run the inference_sample.py Python script by exporting the values for HTTP_HEADER_NAME(e.g. x-api-key) and HTTP_HEADER_VALUE(e.g. token1) according to the values configured in ingress section.

export AUTH_PROVIDER="cognito"
export HTTP_HEADER_NAME="x-api-key"
export HTTP_HEADER_VALUE="token1"
PYTHONPATH=.. python utils/kserve/inference_sample.py

The output should look similar to the following:

Status Code 200
JSON Response  {
"predictions": [1, 1]

Dex inference

Run the inference_sample.py Python script by exporting the values for USERNAME(e.g. user@example.com), PASSWORD according to the user profile

export AUTH_PROVIDER="dex"
export USERNAME="user@example.com"
export PASSWORD="12341234"
PYTHONPATH=.. python utils/kserve/inference_sample.py

The output should look similar to the following:

Status Code 200
JSON Response  {
"predictions": [1, 1]
Last modified September 1, 2023: v1.7.0-aws-b1.0.3 website changes (#791) (7faf1a5)