warning

To deploy this example for fine-tuning a LLM on EKS, you need access to AWS Trainium ec2 instance. If deployment fails, check if you have access to this instance type. If nodes aren't starting, check Karpenter or Node group logs.

danger

Note: The Llama 3 model is governed by the Meta license. To download the model weights and tokenizer, visit the website and accept the license before requesting access.

info

We are working to improve this blueprint with better observability, logging, and scalability.

Llama 3 fine-tuning on Trn1 with HuggingFace Optimum Neuron

This guide shows you how to fine-tune the Llama3-8B language model using AWS Trainium (Trn1) EC2 instances. We'll use HuggingFace Optimum Neuron to make integration with Neuron easy.

What is Llama 3?

Llama 3 is a large language model (LLM) for tasks like text generation, summarization, translation, and question answering. You can fine-tune it for your specific needs.

AWS Trainium

AWS Trainium (Trn1) instances are designed for high-throughput, low-latency deep learning, ideal for training large models like Llama 3. The AWS Neuron SDK enhances Trainium's performance by optimizing models with advanced compiler techniques and mixed precision training for faster, accurate results.

1. Deploying the Solution

Prerequisites

👈

2. Launch the Llama training job

Before we launch the training script, let's first deploy a utility pod. You will get on an interactive shell of this pod to monitor the progress of the fine-tuning job, access the fined tuned model weights, and view the output generated by the fine-tuned model for sample prompts.

kubectl apply -f training-artifact-access-pod.yaml

Create the ConfigMap for the training script:

kubectl apply -f llama3-finetuning-script-configmap.yaml

In order for the training script to be able to download the Llama 3 model from HuggingFace, it needs your HuggingFace Hub access token for authentication and accessing the model. For guidance on how to create and manage your HuggingFace tokens, please visit Hugging Face Token Management.

We'll set your HuggingFace Hub token as an environment variable. Replace your_huggingface_hub_access_token with your actual HuggingFace Hub access token.

export HUGGINGFACE_HUB_ACCESS_TOKEN=$(echo -n "your_huggingface_hub_access_token" | base64)

Deploy the Secret and fine-tuning Job resources by running the following command, which automatically substitutes HUGGINGFACE_HUB_ACCESS_TOKEN environment variable in the yaml before applying the resources to your Kubernetes cluster.

Note: The fine-tuning container image is fetched from the us-west-2 ECR repository. Review the HuggingFace website for additional guidance to check if it provides another region that you prefer based on the region you selected above for running this particular fine-tuning example. If you choose a different supported region, update the AWS account ID and region in the container image URL in the lora-finetune-resources.yaml file before running the below command.

envsubst < lora-finetune-resources.yaml | kubectl apply -f -

3. Verify fine-tuned Llama3 model

Check the job status:

kubectl get jobs

Note: If the container isn't scheduled, check Karpenter logs for errors. This might happen if the chosen availability zones (AZs) or subnets lack an available trn1.32xlarge EC2 instance. To fix this, update the local.azs field in the main.tf file, located at ai-on-eks/infra/base/terraform. Ensure the trainium-trn1 EC2NodeClass in the addons.tf file, also at ai-on-eks/infra/base/terraform, references the correct subnets for these AZs. Then, rerun install.sh from ai-on-eks/infra/trainium-inferentia to apply the changes via Terraform.

To monitor the log for the fine-tuning job, access the tuned model, or check the generated text-to-SQL outputs from the test run with the fine-tuned model, open a shell in the utility pod and navigate to the /shared folder where these can be found. The fine-tuned model will be saved in a folder named llama3_tuned_model_<timestamp> and the generated SQL queries from sample prompts can be found in a log file named llama3_finetuning.out alongside the model folder.

kubectl exec -it training-artifact-access-pod -- /bin/bash

cd /shared

ls -l llama3_tuned_model* llama3_finetuning*

4. Cleaning up

Note: Always run the cleanup steps to avoid extra AWS costs.

To remove the resources created by this solution, run these commands from the root of the ai-on-eks repository:

# Delete the Kubernetes resources:
cd blueprints/training/llama-lora-finetuning-trn1
envsubst < lora-finetune-resources.yaml | kubectl delete -f -
kubectl delete -f llama3-finetuning-script-configmap.yaml
kubectl delete -f training-artifact-access-pod.yaml

Clean up the EKS cluster and related resources:

cd ../../../infra/trainium-inferentia/terraform
./cleanup.sh

What is Llama 3?​

AWS Trainium​

1. Deploying the Solution​

Prerequisites

2. Launch the Llama training job​

3. Verify fine-tuned Llama3 model​

4. Cleaning up​