SLIME training backend setup

This doc describes how to train an AgentCore Runtime-deployed agent with the slime training backend. The public user surface is the SlimeRunner class for launching training.

For known issues (e.g. the norm-epsilon mismatch on Qwen2.5-32B-Instruct) see slime troubleshooting.

Prerequisites

A GPU cluster with CUDA>=12.9 installed.
Python 3.12+ and uv.
AWS credentials with permission to invoke an AgentCore Runtime and read/write an S3 bucket.
An AgentCore Runtime deployment of your agent — follow the Prepare agent for RL guide. Save the resulting runtime ARN — required as the agent_runtime_arn argument on SlimeRunner below for Agent rollouts.
An S3 bucket for rollout result delivery — required as the s3_bucket argument on SlimeRunner below.

Installation

Choose one of the two paths below to install slime, then install the toolkit into the same environment.

Option A: Official slime docker

Follow slime’s own installation docs and use the container image (slimerl/slime:latest). Inside the container, slime and Megatron-LM ship pre-installed at /root/slime and /root/Megatron-LM — use those paths for slime_dir / megatron_dir on SlimeRunner.

Install the toolkit with the slime-backend extras inside the container:

uv pip install -e ".[slime]"

Option B: Bare-metal install script

Install slime and its heavyweight dependency stack (Megatron-LM, Transformer Engine, Apex, flash-attn, sglang, torch_memory_saver) with the provided script, which clones slime + Megatron-LM into the current directory and applies slime’s official patches. Run it inside your activated python environment.

uv pip install -e ".[slime]"
export CUDA_HOME=/usr/local/cuda-13.0
bash src/agentcore_rl_toolkit/backends/slime/scripts/install_slime.sh cu13

Point slime_dir / megatron_dir on SlimeRunner at the slime and Megatron-LM directories the script cloned.

Prepare data

The training dataset is a JSONL file where each line is one rollout request. Every line has the shape:

{"prompt": "...", "metadata": { /* whatever your agent expects */ }}

prompt — top-level string, used by slime for length filtering only.
metadata — copied verbatim as the payload dict your @rollout_entrypoint function receives. Put every per-rollout config the agent needs here (user prompt, ground-truth answer, task IDs, repo URIs, etc.).

Example (GSM8K):

{"prompt": "How many ...?", "metadata": {"prompt": "How many ...?", "answer": "42"}}

Launch training with `SlimeRunner`

SlimeRunner is the one and only entry point — a Python class that stops stale processes, starts a Ray head, submits the slime training job, and streams output. Defaults target 8 × H100 (num_gpus=8, tp_size=2, rollout_gpus_per_engine=2); tune them for your cluster.

from agentcore_rl_toolkit.backends.slime import SlimeRunner

SlimeRunner(
    exp_id="gsm8k-3b-smoke",
    agent_runtime_arn="arn:aws:bedrock-agentcore:...",
    s3_bucket="your-bucket-name",
    model_dir="/path/to/Qwen2.5-3B-Instruct",
    data_path="/path/to/gsm8k_tiny.jsonl",
    model_type="qwen2.5-3B",
).train(num_rollout=1)   # 1 = smoke test; bump to 100 for a real run

Wandb — set WANDB_API_KEY and WANDB_ENTITY in your environment (plus wandb_project / wandb_group on the constructor) to log a run. Unset env vars skip wandb entirely.

Config-file workflow — dump kwargs to YAML and call SlimeRunner.from_yaml("my_run.yaml") instead.

SlimeRunner exposes every field most experiments tune (cluster shape, training hyperparameters, per-rollout ACR limits, extra_flags for extra arguments to be directly passed to SLIME) as constructor arguments. See the API reference or help(SlimeRunner) for the full list.