KServe
This tutorial shows how to set up a load balancer endpoint for serving prediction requests over an external DNS on AWS.
Note: Kubeflow on AWS v1.4 uses KFServing. The KFServing project is now called KServe.
Read the background section of the Load Balancer installation guide to familiarize yourself with the requirements for creating an Application Load Balancer on AWS.
Prerequisites
This guide assumes that you have:
- The necessary prerequisites, including a Kubeflow deployment.
- The AWS Load Balancer controller configured with one of the following deployment options:
- A Cognito-integrated deployment that is configured with the AWS Load Balancer controller by default.
- A deployment that is not integrated with Cognito (for example, the Vanilla deployment, which uses Dex as an auth provider), but have followed the Exposing Kubeflow over Load Balancer guide.
- A subdomain for hosting Kubeflow. For this guide, we will use the domain
platform.example.com
. - An existing profile namespace for a user in Kubeflow. For this guide, we will use the example profile namespace
staging
. - Verified that your current directory is the root of the repository by running the
pwd
command. The output should be<path/to/kubeflow-manifests>
directory.
Configure a default domain with KNative Serving
Use Knative Serving to set up network routing resources.
The default fully qualified domain name (FQDN) for a route in Knative Serving is {route}.{namespace}.{default-domain}
. Knative Serving routes use example.com
as the default domain. If you create an InferenceService
resource called sklearn-iris
in the staging
namespace without changing the default domain, the resulting InfererenceService
domain would be http://sklearn-iris.staging.example.com
.
To host an InferenceService
on the same domain that you use to host Kubeflow (for example, platform.example.com
), then you will need to edit the config-domain
ConfigMap in the knative-serving
namespace to configure platform.example.com
to be used as the domain for the routes.
Edit the ConfigMap to change the default domain as per your deployment. Remove the _example
key and replace example.com
with your domain (e.g. platform.example.com
).
apiVersion: v1
kind: ConfigMap
data:
platform.example.com: ""
...
For more detailed instructions, see the KNative Serving Changing the default domain procedure.
Request a certificate
Request a certificate in AWS Certificate Manager (ACM) to get TLS support from the Load Balancer.
Certificate request background
Knative concatenates the namespace in the FQDN for a route and the domain is delimited by a dot by default. The URLs for InferenceService
resources created in each namespace will be in a different subdomain.
- For example, if you have two namespaces,
staging
andprod
, and create anInferenceService
resource calledsklearn-iris
in both of these namespaces, then the URLs for each resource will behttp://sklearn-iris.staging.platform.example.com
andhttp://sklearn-iris.prod.platform.example.com
, respectively.
This means that you need to specify all subdomains in which you plan to create an InferenceService
resource while creating the SSL certificate in ACM.
- For example, for
staging
andprod
namespaces, you will need to add*.prod.platform.example.com
,*.staging.platform.example.com
and*.platform.example.com
to the certificate.
DNS only supports wildcard placeholders in the leftmost part of the domain name. When you request a wildcard certificate using ACM, the asterisk (*) must be in the leftmost position of the domain name and can protect only one subdomain level.
- For example,
*.platform.example.com
can protectstaging.platform.example.com
, andprod.platform.example.com
, but it cannot protectsklearn-iris.staging.platform.example.com
.
Create a certificate
Note: Both of these domains should be requested in the same certificate
Create an ACM certificate for *.platform.example.com
and *.staging.platform.example.com
in your cluster’s region by following the create certificates for domain steps in the Load Balancer installation guide.
Once the certificate status changes to Issued
, export the ARN of the certificate created:
export certArn=<>
If you are using Cognito for user authentication, see Cognito. If you use Dex as the auth provider in your Kubeflow deployment, see Dex.
Cognito ingress
It is not currently possible to programatically authenticate a request that uses Amazon Cognito for user authentication through Load Balancer. You cannot generate AWSELBAuthSessionCookie
cookies by using the access tokens from Cognito.
To work around this, it is necessary to create a new Load Balancer endpoint for serving traffic that authorizes based on custom strings specified in a predefined HTTP header.
Use an ingress to set the HTTP header conditions on your Load Balancer. This creates rules that route requests based on HTTP headers. This can be used for service-to-service communication in your application.
Create ingress
- Configure the following parameters for ingress:
certArn
: ARN of certificate created during Request a certificate step.- (optional)
httpHeaderName
: Custom HTTP header name that you want to configure for the rule evaluation. Defaults tox-api-key
. httpHeaderValues
: One or more match strings that need to be compared against the header value if the request received. You only need to pass one of the tokens in the request. Pick strong values.
Note: The
httpHeaderName
andhttpHeaderValues
values correspond to the HttpHeaderConfig values
- Replace the
token1
string with a token of your choice. Optionally, replace thehttpHeaderName
string as well.printf ' certArn='$certArn' httpHeaderName=x-api-key httpHeaderValues=["token1"] ' > awsconfigs/common/istio-ingress/overlays/api/params.env
- Create the ingress with the following command:
kustomize build awsconfigs/common/istio-ingress/overlays/api | kubectl apply -f -
- Check if the ingress-managed Load Balancer is provisioned. This may take a few minutes to complete.
kubectl get ingress -n istio-system istio-ingress-api NAME CLASS HOSTS ADDRESS PORTS AGE istio-ingress-api <none> * k8s-istiosys-istioing-xxxxxx-110050202.us-west-2.elb.amazonaws.com 80 14m
Once your Load Balancer is ready, move on to the Add DNS records step to add a DNS record for the staging subdomain.
Dex ingress
Update the certificate for your Load Balancer
- Configure the parameters for ingress with the ARN of the certificate created during the Request a certificate step.
printf 'certArn='$certArn'' > awsconfigs/common/istio-ingress/overlays/https/params.env
- Update the Load Balancer with the following command:
kustomize build awsconfigs/common/istio-ingress/overlays/https | kubectl apply -f -
- Get the Load Balancer address
kubectl get ingress -n istio-system istio-ingress NAME CLASS HOSTS ADDRESS PORTS AGE istio-ingress <none> * k8s-istiosys-istioing-xxxxxx-110050202.us-west-2.elb.amazonaws.com 80 15d
Once your Load Balancer is ready, move on to the Add DNS records step to add a DNS record for the staging subdomain.
Add DNS records
Once your ingress-managed Load Balancer is ready, copy the ADDRESS
of that Load Balancer and create a CNAME
entry to it in Amazon Route 53 under your subdomain (e.g. platform.example.com
) for *.staging.platform.example.com
.
Run a sample InferenceService
Create an AuthorizationPolicy
Namespaces created by the Kubeflow profile controller have a missing authorization policy that prevents the KFServing predictor and transformer from working.
Known Issue: See kserve/kserve#1558 and kubeflow/kubeflow#5965 for more information.
Create the AuthorizationPolicy
as mentioned in issue #82 as a workaround until this is resolved. Verify that the policies have been created by listing the authorizationpolicies
in the istio-system
namespace:
kubectl get authorizationpolicies -n istio-system
Create an InferenceService
Set the environment variable value for PROFILE_NAMESPACE
(e.g. staging
) according to your environment:
export PROFILE_NAMESPACE="staging"
Create a scikit-learn InferenceService
using a sample from the KFserving repository and wait for READY
to be True
.
kubectl apply -n ${PROFILE_NAMESPACE} -f https://raw.githubusercontent.com/kserve/kserve/release-0.7/docs/samples/v1beta1/sklearn/v2/sklearn.yaml
Check InferenceService
status
Check the InferenceService
status. Once it is ready, copy the URL to use for sending a prediction request.
kubectl get inferenceservices sklearn-irisv2 -n ${PROFILE_NAMESPACE}
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
sklearn-irisv2 http://sklearn-iris2.staging.platform.example.com True 100 sklearn-irisv2-predictor-default-00001 3m31s
Send an inference request
Set the environment variable values for KUBEFLOW_DOMAIN
(e.g. platform.example.com
) according to your environment:
export KUBEFLOW_DOMAIN="platform.example.com"
Install dependencies for the script by running:
cd tests/e2e
pip install -r requirements.txt
Run the sample python script to send an inference request based on your auth provider:
Cognito inference
Run the inference_sample.py Python script by exporting the values for HTTP_HEADER_NAME
(e.g. x-api-key
) and HTTP_HEADER_VALUE
(e.g. token1
) according to the values configured in ingress section.
export AUTH_PROVIDER="cognito"
export HTTP_HEADER_NAME="x-api-key"
export HTTP_HEADER_VALUE="token1"
PYTHONPATH=.. python utils/kserve/inference_sample.py
The output should look similar to the following:
Status Code 200
JSON Response {
"model_name": "sklearn-irisv2",
"model_version": null,
"id": "e5fc40ba-5f02-42f7-aff8-34042facbe11",
"parameters": null,
"outputs": [
{
"name": "predict",
"shape": [
2
],
"datatype": "FP32",
"parameters": null,
"data": [
1,
2
]
}
]
}
Dex inference
Run the inference_sample.py Python script by exporting the values for USERNAME
(e.g. user@example.com
), PASSWORD
according to the user profile
export AUTH_PROVIDER="dex"
export USERNAME="user@example.com"
export PASSWORD="12341234"
PYTHONPATH=.. python utils/kserve/inference_sample.py
The output should look similar to the following:
Status Code 200
JSON Response {
"model_name": "sklearn-irisv2",
"model_version": null,
"id": "e5fc40ba-5f02-42f7-aff8-34042facbe11",
"parameters": null,
"outputs": [
{
"name": "predict",
"shape": [
2
],
"datatype": "FP32",
"parameters": null,
"data": [
1,
2
]
}
]
}