Fine-Tuning & Customization¶

ML Container Creator includes a do/tune command that wraps SageMaker AI Managed Model Customization — a serverless fine-tuning capability that eliminates instance selection and container management. You provide a dataset and technique; SageMaker handles infrastructure, optimization, and produces a deployable model artifact that feeds directly back into your project's deployment lifecycle.

Prerequisites¶

Requirement	Details
Deployed endpoint	Endpoint must be `InService` (run `./do/deploy` first)
AWS credentials	Configured via `aws configure` or environment variables
Framework	`transformers` only
Deployment target	Any target except `batch-transform`
Bootstrapped account	Run `ml-container-creator bootstrap` to provision IAM permissions and tune S3 bucket
Python SDK	`sagemaker>=2.232.0` installed in your Python environment

Supported Models Only

do/tune works with models in the Supported Model Catalog. If your model isn't supported, the script will tell you which models are available and suggest do/train for custom training workflows.

The Tune-Adapter-Deploy Feedback Loop¶

The fine-tuning workflow follows an iterative loop: prepare your dataset, tune the model, deploy the result, test it, and iterate until you're satisfied with quality.

graph LR
    A[Prepare<br>dataset] --> B[do/tune] --> C[do/adapter add<br>--from-tune] --> D[do/test] --> E{Quality<br>OK?}
    E -->|No| A
    E -->|Yes| F[Production]

Step-by-step flow¶

Prepare your dataset — Format training data as JSONL matching the expected schema for your technique (see Dataset Formats below)

Run do/tune — Submit a managed customization job:

./do/tune --technique sft --dataset s3://my-bucket/train.jsonl

Deploy the output — The tune script prints context-aware next-step commands when the job completes:

For LoRA adapter output (default):

./do/adapter add tuned-sft --from-tune

For full merged model output (--training-type full-rank):

./do/add-ic tuned-v1 --from-tune

Test the result — Verify the fine-tuned model behaves as expected:
```
./do/test
```
Iterate — If quality isn't satisfactory, adjust your dataset or hyperparameters and re-run do/tune. Each technique tracks its own state independently, so you can experiment with SFT and DPO in parallel without interference.

How Output Feeds Into Deployment¶

When do/tune completes, it stores the output artifact path in do/config and detects the output type based on the training type used:

Training Type	Output Type	Config Variable	Deployment Command
`lora` (default)	LoRA adapter weights	`TUNE_ADAPTER_PATH_<TECHNIQUE>`	`do/adapter add --from-tune`
`full-rank`	Full merged model	`TUNE_MODEL_PATH_<TECHNIQUE>`	`do/add-ic --from-tune`

The --from-tune flag reads the output path from do/config automatically — no need to copy S3 URIs manually.

Adapter output (LoRA)¶

# Use the latest tune output (any technique)
./do/adapter add tuned-sft --from-tune

# Use a specific technique's output
./do/adapter add tuned-sft --from-tune sft

# Or pass the S3 path explicitly
./do/adapter add tuned-sft --weights s3://mlcc-tune-123456789012-us-east-1/output/adapter.tar.gz

Full model output¶

# Deploy as a new inference component
./do/add-ic tuned-v1 --from-tune

# Or pass the S3 path explicitly
./do/add-ic tuned-v1 --model-data s3://mlcc-tune-123456789012-us-east-1/output/model.tar.gz

# Replace the current base model
./do/deploy --force-ic --model-data s3://mlcc-tune-123456789012-us-east-1/output/model.tar.gz

Multiple techniques¶

Each technique's output is tracked independently. You can tune with SFT, then tune with DPO, and deploy either result:

# Tune with SFT
./do/tune --technique sft --dataset s3://my-bucket/sft-data.jsonl

# Tune with DPO (doesn't affect SFT output)
./do/tune --technique dpo --dataset s3://my-bucket/dpo-data.jsonl

# Deploy the SFT adapter
./do/adapter add tuned-sft --from-tune sft

# Or deploy the DPO adapter instead
./do/adapter add tuned-dpo --from-tune dpo

Supported Models¶

The following model families support managed customization via do/tune:

Provider	Model Family	Sizes
Alibaba	Qwen 2.5	7B, 14B, 32B, 72B
Alibaba	Qwen 3	0.6B, 1.7B, 4B, 8B, 14B, 32B
DeepSeek	R1 Distill (Llama)	8B, 70B
DeepSeek	R1 Distill (Qwen)	1.5B, 7B, 14B, 32B
Meta	Llama 3.1 Instruct	8B
Meta	Llama 3.2 Instruct	1B, 3B
Meta	Llama 3.3 Instruct	70B
OpenAI	GPT-OSS	20B, 120B

View the full catalog at any time:

./do/tune --list-models

Unsupported model behavior¶

If your configured model is not in the Supported Model Catalog, do/tune exits with a clear message:

❌ Model "my-custom-model-7b" is not yet supported for managed customization.

   Supported model families:
   • Alibaba Qwen 2.5 / Qwen 3
   • DeepSeek R1 Distill
   • Meta Llama 3.1 / 3.2 / 3.3
   • OpenAI GPT-OSS

   For custom training workflows, use do/train (coming in a future release).

The script validates your model at runtime against the catalog, so catalog updates take effect without regenerating your project.

Techniques¶

do/tune supports four customization techniques. Each technique requires a different dataset format and produces different training dynamics.

Technique	Use Case	Dataset Format
SFT	Teach the model a specific style or task	Prompt/completion pairs
DPO	Align the model with human preferences	Prompt with chosen/rejected responses
RLAIF	Align using an AI judge	Prompts with reward prompt reference
RLVR	Align using code-based verification	Prompts with reward function Lambda

Not all models support all techniques. Check what's available for your model:

./do/tune --list-models

Training types¶

Each model+technique combination supports one or both training types:

lora (default) — Produces lightweight LoRA adapter weights. Faster to train, smaller artifacts, deployed via do/adapter add.
full-rank — Produces a full merged model. Longer training, larger artifacts, deployed via do/add-ic.

# LoRA adapter (default)
./do/tune --technique sft --dataset s3://my-bucket/train.jsonl

# Full-rank fine-tuning
./do/tune --technique sft --dataset s3://my-bucket/train.jsonl --training-type full-rank

Dataset Formats¶

Datasets must be in JSONL format (one JSON object per line). The expected schema depends on the technique and model family. The script validates the first 10 lines of your dataset before submitting the job.

SFT (Supervised Fine-Tuning)¶

Each line contains a prompt and the desired completion:

{"prompt": "What is the capital of France?", "completion": "The capital of France is Paris."}
{"prompt": "Summarize photosynthesis in one sentence.", "completion": "Photosynthesis converts sunlight, water, and CO2 into glucose and oxygen."}
{"prompt": "Write a haiku about coding.", "completion": "Bugs in the midnight\nStack traces illuminate\nCoffee grows colder"}

Required keys: prompt (string), completion (string)

DPO (Direct Preference Optimization)¶

Each line contains a prompt with a preferred ("chosen") and dispreferred ("rejected") response:

{"prompt": "Explain quantum computing", "chosen": "Quantum computing leverages quantum mechanical phenomena like superposition and entanglement to process information. Unlike classical bits that are 0 or 1, quantum bits (qubits) can exist in multiple states simultaneously.", "rejected": "Computers are really fast these days."}
{"prompt": "What causes rain?", "chosen": "Rain forms when water vapor in the atmosphere condenses into droplets heavy enough to fall due to gravity.", "rejected": "The sky cries sometimes."}

Required keys: prompt (string), chosen (string), rejected (string)

RLVR (Reinforcement Learning with Verifiable Rewards)¶

Each line contains a prompt (as a message array) and a reference to a Lambda function that scores the model's output:

{"prompt": [{"role": "user", "content": "Solve: 2 + 2"}], "reward_model": "arn:aws:lambda:us-east-1:123456789012:function:math-verifier"}
{"prompt": [{"role": "user", "content": "Write a function that reverses a string in Python"}], "reward_model": "arn:aws:lambda:us-east-1:123456789012:function:code-verifier"}

Required keys: prompt (array of message objects), reward_model (string — Lambda ARN)

Use the --reward-function flag to specify the Lambda ARN:

./do/tune --technique rlvr \
  --dataset s3://my-bucket/prompts.jsonl \
  --reward-function arn:aws:lambda:us-east-1:123456789012:function:my-reward

RLAIF (Reinforcement Learning from AI Feedback)¶

Same format as RLVR, but uses a reward prompt (an LLM judge) instead of a Lambda function:

{"prompt": [{"role": "user", "content": "Explain gravity to a 5-year-old"}], "reward_model": "s3://my-bucket/reward-prompts/clarity-judge.txt"}
{"prompt": [{"role": "user", "content": "Write a professional email declining a meeting"}], "reward_model": "s3://my-bucket/reward-prompts/tone-judge.txt"}

Required keys: prompt (array of message objects), reward_model (string — S3 URI to reward prompt)

Use the --reward-prompt flag to specify the reward prompt location:

./do/tune --technique rlaif \
  --dataset s3://my-bucket/prompts.jsonl \
  --reward-prompt s3://my-bucket/reward-prompts/clarity-judge.txt

Model-specific formats¶

Some model families may expect a different format (e.g., Converse format). The Supported Model Catalog encodes the expected schema per model family and technique. If your model requires a non-default format, the validation error message will show the expected schema.

Dataset sources¶

Datasets can be provided from two sources:

# From S3
./do/tune --technique sft --dataset s3://my-bucket/path/to/train.jsonl

# From Hugging Face Hub
./do/tune --technique sft --dataset hf://my-org/my-dataset

# From a specific HF split
./do/tune --technique sft --dataset hf://my-org/my-dataset/train

When using a Hugging Face dataset, the script downloads it to S3 automatically before submitting the job. If the dataset requires authentication, set HF_TOKEN in your environment or configure it via do/secrets.

`do/tune` vs `do/train`¶

ML Container Creator offers two paths for model customization:

	`do/tune` (Managed Serverless)	`do/train` (Bespoke Training)
Status	Available now	Coming in a future release
Infrastructure	Fully managed by SageMaker	You choose instance types and containers
Supported models	Models in the Supported Model Catalog	Any model
Techniques	SFT, DPO, RLAIF, RLVR	Any training script
Configuration	Minimal — dataset + technique	Full control over training code
When to use	Your model is supported and you want the fastest path	You need custom training logic or an unsupported model

Recommendation: Start with do/tune if your model is in the Supported Model Catalog. It's the fastest path from dataset to deployed adapter with zero infrastructure management. Fall back to do/train when you need custom training logic or your model isn't supported.

CLI Reference¶

Synopsis¶

./do/tune --technique <technique> --dataset <source> [options]
./do/tune --status
./do/tune --list-models
./do/tune --help

Required flags¶

Flag	Values	Description
`--technique`	`sft`, `dpo`, `rlaif`, `rlvr`	Customization technique to apply
`--dataset`	S3 URI or `hf://org/name[/split]`	Training dataset location

Training type¶

Flag	Values	Default	Description
`--training-type`	`lora`, `full-rank`	`lora`	Whether to produce LoRA adapter weights or a full merged model

Hyperparameter overrides (all optional)¶

Flag	Type	Description
`--epochs`	integer	Number of training epochs (typically 1–5)
`--learning-rate`	float	Learning rate (e.g., `2e-4`)
`--max-seq-length`	integer	Maximum sequence length in tokens
`--lora-rank`	integer	LoRA rank (e.g., 16, 32, 64). Only applies when `--training-type lora`
`--lora-alpha`	integer	LoRA alpha scaling factor. Only applies when `--training-type lora`
`--batch-size`	integer	Global batch size

Evaluator flags (RLVR/RLAIF only)¶

Flag	Type	Description
`--reward-function`	Lambda ARN	ARN of the reward function Lambda (RLVR)
`--reward-prompt`	S3 URI	S3 path to reward prompt file (RLAIF)

Model and infrastructure overrides¶

Flag	Type	Description
`--model`	JumpStart model ID	Override the model to customize (defaults to `MODEL_ID` from `do/config`)
`--output-bucket`	S3 bucket name	Override the output bucket (defaults to `TUNE_S3_BUCKET`)
`--role`	IAM role ARN	Override the execution role

Job control¶

Flag	Description
`--force`	Force a new job even if a previous job exists for this technique
`--no-wait`	Submit the job and exit immediately without polling
`--status`	Show status of all tracked tune jobs
`--dry-run`	Validate inputs and show what would be submitted without creating a job
`--list-models`	Print the Supported Model Catalog and exit
`--help`	Show usage information

Examples¶

Basic SFT with S3 dataset¶

./do/tune --technique sft --dataset s3://my-bucket/train.jsonl

DPO with Hugging Face dataset and custom learning rate¶

./do/tune --technique dpo --dataset hf://my-org/preference-data --learning-rate 1e-5

Full-rank fine-tuning¶

./do/tune --technique sft --dataset s3://my-bucket/train.jsonl --training-type full-rank

Override model (tune a different model than what's deployed)¶

./do/tune --technique sft --dataset s3://my-bucket/train.jsonl \
  --model meta-textgeneration-llama-3-3-70b-instruct

RLVR with reward function¶

./do/tune --technique rlvr \
  --dataset s3://my-bucket/prompts.jsonl \
  --reward-function arn:aws:lambda:us-east-1:123456789012:function:my-reward

Dry run (validate without submitting)¶

./do/tune --technique sft --dataset s3://my-bucket/train.jsonl --dry-run

Force re-run after a failed job¶

./do/tune --technique sft --dataset s3://my-bucket/train.jsonl --force

Idempotency¶

Re-running do/tune with the same technique resumes or reports on the existing job rather than creating a duplicate:

Job in progress — Polls and displays progress until completion
Job completed — Displays results and next-step commands
Job failed — Displays the failure reason and suggests --force to retry

Use --force to explicitly start a new job, overriding the previous one for that technique.

MLflow Integration¶

When an MLflow tracking server is configured in your SageMaker domain, customization jobs automatically log training metrics, hyperparameters, and model artifacts to MLflow. The script displays the MLflow experiment URL after job submission.

If no MLflow server is configured, the script proceeds without tracking and prints a note suggesting MLflow setup for experiment comparison.

Future: Bedrock Custom Model Import¶

The output artifacts from managed customization are compatible with Amazon Bedrock Custom Model Import. This deployment path — importing your fine-tuned model into Bedrock for serverless inference — is planned for a future release. The current workflow deploys via SageMaker endpoints using do/adapter add or do/add-ic.

Troubleshooting¶

"Model not yet supported"¶

Your configured model isn't in the Supported Model Catalog. Run ./do/tune --list-models to see available models, or use --model to override with a supported model ID.

Dataset validation fails¶

The script validates the first 10 lines of your dataset. Check that:

The file is valid JSONL (one JSON object per line)
Each line contains the required keys for your technique
Values match the expected types (strings for SFT/DPO, arrays for RLVR/RLAIF prompts)

The error message shows the first malformed line and the expected format.

"Technique not supported for this model"¶

Not all models support all techniques. Run ./do/tune --list-models to see which techniques are available for your model.

Job fails with AccessDenied¶

Run ml-container-creator bootstrap to provision the required IAM permissions. The bootstrap stack adds SageMaker training, model package, and MLflow permissions.

Python SDK not installed¶

The script requires sagemaker>=2.232.0. Install it:

pip install "sagemaker>=2.232.0"

Job failed — how to retry¶

When a job fails, the script displays the failure reason. Fix the underlying issue and re-run with --force:

./do/tune --technique sft --dataset s3://my-bucket/train.jsonl --force