Katib

Get started with Katib on Amazon EKS

Access AWS Services from Katib

In order to grant Katib experiment pods access to AWS resources, the corresponding profile in which the experiment is created needs to be configured with the AwsIamForServiceAccount plugin. To configure the AwsIamForServiceAccount plugin to work with Profiles, follow the steps below.

Prerequisites

Steps to configure Profiles with AWS IAM permissions can be found in the Profiles component guide. Follow those steps to configure the profile controller to work with the AwsIamForServiceAccount plugin.

The following is an example of a profile using the AwsIamForServiceAccount plugin:

apiVersion: kubeflow.org/v1
kind: Profile
metadata:
  name: some_profile
spec:
  owner:
    kind: User
    name: some-user@kubeflow.org
  plugins:
  - kind: AwsIamForServiceAccount
    spec:
      awsIamRole: arn:aws:iam::123456789012:role/some-profile-role

The AWS IAM permissions granted to the experiment pods are specified in the profile’s awsIamRole.

Configuration

Verify Prerequisites

You can verify that the profile was configured correctly by running the following commands:

export PROFILE_NAME=<name of the created profile>

kubectl get serviceaccount -n ${PROFILE_NAME} default-editor -oyaml | grep "eks.amazonaws.com/role-arn"

The output should look similar to the following:

eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/some-profile-role

Experiment trial spec configuration

When creating Katib experiments, you must correctly configure the experiment trial spec.

The default-editor service account needs to be added to the trialSpec section of an experiment spec.

Specifically, the serviceAccountName field needs to be added under the Pod spec section with a value of default-editor.

For example, in the following experiment spec, the serviceAccountName field is added under the Pod spec of the Job spec:

apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
  namespace: some-profile
  name: some-name
spec:
  objective:
  ...
trialTemplate:
...
trialSpec:
  apiVersion: batch/v1
  kind: Job
  spec:
    template:
      metadata:
        annotations:
          sidecar.istio.io/inject: "false"
      spec:
        containers:
          - name: training-container
            image: public.ecr.aws/z1j2m4o4/kubeflow-katib-mxnet-mnist:latest
            command:
            ...  
        restartPolicy: Never
        serviceAccountName: default-editor    # This addition is necessary

As another example, in the following experiment spec the serviceAccountName field is added under the Pod spec of the TFJob spec:

apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
  namespace: some-profile
  name: some-name
spec:
  objective:
  ...
trialTemplate:
...
trialSpec:
  apiVersion: kubeflow.org/v1
  kind: TFJob
  metadata:
    generateName: tfjob
    namespace: your-user-namespace
  spec:
    tfReplicaSpecs:
      PS:
        replicas: 1
        ...
        spec:
          containers:
            - name: tensorflow
              image: gcr.io/your-project/your-image
              command:
              ...
          serviceAccountName: default-editor    # This addition is necessary
      Worker:
        replicas: 3
        ...
        spec:
          containers:
            - name: tensorflow
              image: gcr.io/your-project/your-image
              command:
              ...
          serviceAccountName: default-editor    # This addition is necessary

Config map configuration for `katib-config`

This configuration is only required if your Katib algorithm pod needs access to AWS services.

The katib-config component contains configurations involving metrics collection, tuning algorithms, and early stopping algorithms.

By default, pods that will run the tuning (suggestion) algorithm are created under the default service account present in the profile namespace. However, the AwsIamForServiceAccount plugin annotates the default-editor service account with the profile’s awsIamRole, which means that only pods created under the default-editor service account will be granted the desired AWS permissions.

The below steps will modify the katib-config to create pods under the default-editor service account so that the pods will be granted the desired permissions.

Open the katib-config config map for editing.

kubectl edit configMap katib-config -n kubeflow

Navigate to the suggestion volume settings. The settings will look similar to the following:

suggestion: |-
{
  "random": {
    "image": "docker.io/kubeflowkatib/suggestion-hyperopt",
    ...
  },
  "tpe": {
    "image": "docker.io/kubeflowkatib/suggestion-hyperopt:v0.13.0",
    ...
  }
  ...
}

For each algorithm (e.g random, tpe, etc.) add a key for serviceAccountName with a value of default-editor:

suggestion: |-
{
  "random": {
    "image": "docker.io/kubeflowkatib/suggestion-hyperopt",
    "serviceAccountName": "default-editor"
    ...
  },
  "tpe": {
    "image": "docker.io/kubeflowkatib/suggestion-hyperopt:v0.13.0",
    "serviceAccountName": "default-editor"
    ...
  }
  ...
}

Close the edit window. This will apply the configuration.

Example: S3 Access from Katib experiment pods

The following steps walk through creating an experiment with pods that have permissions to list buckets in S3.

Prerequisites

Make sure that you have completed the configuration steps.

Steps

Export the name of the profile created in the configuration steps:
```
 export PROFILE_NAME=<the created profile name>
```

Create the following Katib experiment yaml file:

cat <<EOF > experiment.yaml

apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
  namespace: ${PROFILE_NAME}
  name: test
spec:
  objective:
    type: maximize
    goal: 0.90
    objectiveMetricName: Validation-accuracy
    additionalMetricNames:
      - Train-accuracy
  algorithm:
    algorithmName: random
  parallelTrialCount: 3
  maxTrialCount: 12
  maxFailedTrialCount: 1
  parameters:
    - name: lr
      parameterType: double
      feasibleSpace:
       min: "0.01"
        max: "0.03"
    - name: num-layers
      parameterType: int
      feasibleSpace:
       min: "2"
       max: "5"
   - name: optimizer
      parameterType: categorical
      feasibleSpace:
       list:
          - sgd
         - adam
         - ftrl
  trialTemplate:
    primaryContainerName: training-container
    trialParameters:
      - name: learningRate
       description: Learning rate for the training model
        reference: lr
      - name: numberLayers
       description: Number of training model layers
        reference: num-layers
     - name: optimizer
       description: Training model optimizer (sdg, adam or ftrl)
        reference: optimizer
    trialSpec:
      apiVersion: batch/v1
      kind: Job
      spec:
        template:
          metadata:
            annotations:
              sidecar.istio.io/inject: "false"
          spec:
            containers:
              - name: training-container
                image: public.ecr.aws/z1j2m4o4/kubeflow-katib-mxnet-mnist:latest
                command:
                  - "python3"
                  - "/opt/mxnet-mnist/list_s3_buckets.py"
                  - "&&"
                  - "python3"
                  - "/opt/mxnet-mnist/mnist.py"
                  - "--batch-size=64"
                  - "--lr=\${trialParameters.learningRate}"
                  - "--num-layers=\${trialParameters.numberLayers}"
                  - "--optimizer=\${trialParameters.optimizer}"
            restartPolicy: Never
            serviceAccountName: default-editor
EOF

Create the experiment.
```
kubectl apply -f experiment.yaml
```
Describe the experiment.
```
kubectl describe experiments -n ${PROFILE_NAME} test
```
After around five minutes, the status should be successful.

Last modified June 27, 2022: Webside doc fixes (#275) (4fb88f32)

Katib

Access AWS Services from Katib

Prerequisites

Configuration

Verify Prerequisites

Experiment trial spec configuration

Config map configuration for katib-config

Example: S3 Access from Katib experiment pods

Prerequisites

Steps

Config map configuration for `katib-config`