Usage Tracking

Learn about usage tracking for Kubeflow on AWS

AWS uses customer feedback and usage information to improve the quality of the services and software we offer to customers. We have added usage data collection to the AWS Kubeflow distribution in order to better understand customer usage and guide future improvements. Usage tracking for Kubeflow is activated by default, but is entirely voluntary and can be deactivated at any time.

Usage tracking for Kubeflow on AWS collects the instance ID used by one of the worker nodes in a customer’s cluster. This data is sent back to AWS once per day. Usage tracking only collects the EC2 instance ID where Kubeflow is running and does not collect or export any other data to AWS. If you wish to deactivate this tracking, instructions are below.

Activate usage tracking

Usage tracking is activated by default. If you deactivated usage tracking for your Kubeflow deployment and would like to activate it after the fact, you can do so at any time with the following command:

kustomize build awsconfigs/common/aws-telemetry | kubectl apply -f -

Deactivate usage tracking

Before deploying Kubeflow:

You can deactivate usage tracking by skipping the telemetry component installation:

  • For single line installation, comment out the aws-telemetry line in the kustomization.yaml file of your choosing:
    # ../../aws-telemetry
    
  • For make command installation, comment out the aws-telemetry lines depending on your deployment:
    • The installation configs can be found here
    • Example for the [vanilla installation config] here
      #AWS Telemetry (Optional)
      # aws-telemetry:
      #     installation_options:
      #     kustomize: 
      #         - "../../awsconfigs/common/aws-telemetry"
      #     helm: "../../charts/common/aws-telemetry"
      
  • For Terraform installation export the following variable:
    export TF_VAR_enable_aws_telemetry="false"
    

After deploying Kubeflow:

To deactivate usage tracking on an existing deployment, delete the aws-kubeflow-telemetry cronjob with the following command:

kubectl delete cronjob -n kubeflow aws-kubeflow-telemetry

Information collected by usage tracking

  • Instance ID - We collect the instance ID used by one of the worker nodes in the customer’s EKS cluster. This collection occurs once per day.

Learn more

The telemetry data we collect is in accordance with AWS data privacy policies. For more information, see the following:

Last modified September 21, 2022: Doc fixes (#426) (b572ba57)