Notebooks
Kubeflow Notebooks provide a way to run web-based development environments inside your Kubernetes cluster by running them inside Pods. Users can create Notebook containers directly in the cluster, rather than locally on their workstations. Access control is managed by Kubeflow’s RBAC, enabling easier notebook sharing across the organization.
You can use Notebooks with Kubeflow on AWS to:
- Experiment on training scripts and model development.
- Manage Kubeflow pipeline runs.
- Integrate with Tensorboard for visualization.
- Use EFS and FSx to share data and models across nodes.
- USE EFS and FSx for dynamic volume sizing.
AWS-optimized Kubeflow Notebook servers
Use AWS-optimized Kubeflow Notebook server images to quickly get started with a range of framework, library, and hardware options. These images are built on top of the AWS Deep Learning Containers along with other Kubeflow specific packages.
These container images are available on the Amazon Elastic Container Registry (Amazon ECR). The following images are available as part of this release, however you can always find the latest updated images in the linked ECR repository.
public.ecr.aws/kubeflow-on-aws/notebook-servers/jupyter-tensorflow:2.12.0-gpu-py310-cu118-ubuntu20.04-ec2-v1.0
public.ecr.aws/kubeflow-on-aws/notebook-servers/jupyter-tensorflow:2.12.0-cpu-py310-ubuntu20.04-ec2-v1.0
public.ecr.aws/kubeflow-on-aws/notebook-servers/jupyter-pytorch:2.0.0-gpu-py310-cu118-ubuntu20.04-ec2-v1.0
public.ecr.aws/kubeflow-on-aws/notebook-servers/jupyter-pytorch:2.0.0-cpu-py310-ubuntu20.04-ec2-v1.0
AWS Deep Learning Containers provide optimized environments with popular machine learning frameworks such as TensorFlow and PyTorch, and are available in the Amazon ECR. For more information on AWS Deep Learning Container options, see Available Deep Learning Containers Images.
Along with specific machine learning frameworks, these container images have additional pre-installed packages:
kfp
kfserving
awscli
boto3
For more information on gettings started with Kubeflow Notebooks, see the Quickstart Guide.
Access AWS Services from Notebooks
Use AWS IAM to securely access AWS resources through Kubeflow Notebooks.
Configuration
Prerequisites for setting up AWS IAM for Kubeflow Profiles can be found in the Profiles component guide. These steps go through creating a profile that uses the AwsIamForServiceAccount
plugin. No additional configuration steps are required.
Try it out
- Create a Notebook server through the central dashboard.
- Navigate to the top left drop down menu and select the profile name for the profile that you created.
- Create a Notebook using the Verify Profile IAM Notebook sample.
- Run the Notebook. You should see the S3 buckets present in your account.
RDS and S3 credentials for Kubeflow Pipelines and Notebooks
Set up RDS and S3 credential access to be able to:
- Use
boto3
or AWS libraries that require credentials in a Notebook, specify credentials without hard coding them, and access the credentials through environment variables. - Explore metadata using ml-metadata in a Notebook and specify the necessary credentials using environment variables.
- Use ml-metadata to query metadata during a pipeline run by passing a Kubernetes Secret to a pipeline component.
- Use
boto3
or AWS libraries that require credentials in a Kubeflow Pipelines component.
The following steps create a Kubernetes mysql-secret
and mlpipeline-minio-artifact
Secret with RDS and S3 credentials specified in the AWS Secrets Manager created while deploying the platform. This is a sample for demonstrating how you can use PodDefault
resource and Secrets in Notebooks to access the metadata database and and artifacts in S3 bucket created by pipelines. Make sure you create separate database and IAM users and corresponding secrets in Secrets Manager for your users if you want fine grain access control and auditing.
Set up Secrets access
- Verify that your are in the root of your repository by running the
pwd
command. The path should bePATH/kubeflow-manifests
.
pwd
- Navigate to the test scripts directory and install the dependencies.
cd tests/e2e
pip install -r requirements.txt
- Replace
YOUR_CLUSTER_REGION
,YOUR_CLUSTER_NAME
andYOUR_NAMESPACE
with the appropriate values and run the script.
Note:
YOUR_NAMESPACE
represents the namespace that the Secrets will be set up in. For example, if your Notebooks and pipelines will be in thekubeflow-user-example-com
namespace, then you would usekubeflow-user-example-com
in place ofYOUR_NAMESPACE
. The namespace must exist before executing the script.
PYTHONPATH=.. python utils/notebooks/setup_secrets_access.py --region YOUR_CLUSTER_REGION --cluster YOUR_CLUSTER_NAME --profile-namespace YOUR_NAMESPACE
Use the help flag to learn more about available parameters:
PYTHONPATH=.. python utils/notebooks/setup_secrets_access.py --help
(Optional) Update default Notebook configurations
No Kubeflow Notebook configuration is selected by default. You can make the PodDefault
resources that you created the default credential configuration when creating a Notebook. If you do not follow this step, you must manually select this in the Notebook UI. For more information on set up details, see the Detailed Steps in the Kubeflow Notebooks Quickstart Guide.
Note: Making this configuration default introduces a dependency. The Secrets and PodDefault must be available in all Profile namespaces. If the Secrets and PodDefault resources are not available in a Profile namespaces, newly created Notebook servers in that Profile namespace will fail.
Update the default Kubeflow Notebook configuration either before or after installing Kubeflow.
Option 1: Before installing Kubeflow
Modify the file awsconfigs/apps/jupyter-web-app/configs/spawner_ui_config.yaml
configurations:
# List of labels to be selected, these are the labels from PodDefaults
value:
- add-aws-secret
Option 2: After installing Kubeflow
Update the Notebook configuration at runtime with the following command:
kubectl edit $(kubectl get cm -n kubeflow -l app=jupyter-web-app -o=name | grep 'web-app-config') -n kubeflow
Modify the configuration:
configurations:
# List of labels to be selected, these are the labels from PodDefaults
value:
- add-aws-secret
Save and exit your editor. Then, restart the Notebook deployment to apply the changes.
kubectl rollout restart deployment jupyter-web-app-deployment -n kubeflow
Verify Notebook credentials
Find PodDefault
in the Notebook creation page to verify that your setup was done successfully.
Create a Notebook and check that the environment variables are accessible.
import os
print(os.environ['port'])