SLIME training backend setup
This doc describes how to train an AgentCore Runtime-deployed agent with the
slime training backend. The public
user surface is exactly two things: the
SlimeRunner
class (for launching training) and the two SGLang
patch scripts
(applied once to the SGLang install).
For known issues (Megatron-LM regression on 32B, norm-epsilon mismatch, etc.) see slime troubleshooting.
Prerequisites
Section titled “Prerequisites”- Hardware and CUDA requirements: see slime’s README and the slime docker README for tested GPU configurations per model size.
- Python 3.10+ and
uv. - AWS credentials with permission to invoke an AgentCore Runtime and read/write an S3 bucket.
- An AgentCore Runtime deployment of your agent — follow the
Prepare agent for RL
guide. Save the resulting runtime ARN — required as the
agent_runtime_arnargument onSlimeRunnerbelow for Agent rollouts. - An S3 bucket for rollout result delivery — required as the
s3_bucketargument onSlimeRunnerbelow.
slime environment
Section titled “slime environment”Follow
slime’s own installation docs
— either the container path (slimerl/slime:latest) or a bare-metal
install. Everything below runs inside this environment.
Inside the slimerl/slime:latest container, slime and Megatron-LM
ship pre-installed at /root/slime and /root/Megatron-LM — use
those paths for slime_dir / megatron_dir on SlimeRunner. For a
bare-metal install, point at wherever you cloned slime + Megatron-LM.
Install the toolkit + apply patches
Section titled “Install the toolkit + apply patches”Inside the slime environment:
# From a clone of this repocd /path/to/agentcore-rl-toolkit
# Install the toolkit plus the slime-backend extrasuv pip install -e ".[slime]"Then apply the SGLang token_ids patch — it adds
prompt_token_ids / token_ids fields to chat completion responses
so the gateway can capture RL training trace data. The patch is idempotent:
python -m agentcore_rl_toolkit.backends.slime.patches.sglang_token_ids
# Verify the patch round-trips under greedy decoding (any HF checkpoint# works; Qwen2.5-0.5B-Instruct is the fastest to download + load)python -m agentcore_rl_toolkit.backends.slime.patches.verify_sglang_token_ids \ --model-path /path/to/Qwen2.5-0.5B-Instruct# Expect: "OK: 4/4 checks passed"Prepare data
Section titled “Prepare data”The training dataset is a JSONL file where each line is one rollout request. Every line has the shape:
{"prompt": "...", "metadata": { /* whatever your agent expects */ }}prompt— top-level string, used by slime for length filtering only.metadata— copied verbatim as thepayloaddict your@rollout_entrypointfunction receives. Put every per-rollout config the agent needs here (user prompt, ground-truth answer, task IDs, repo URIs, etc.).
Example (GSM8K):
{"prompt": "How many ...?", "metadata": {"prompt": "How many ...?", "answer": "42"}}Launch training with SlimeRunner
Section titled “Launch training with SlimeRunner”SlimeRunner is
the one and only entry point — a Python class that stops stale
processes, starts a Ray head, submits the slime training job, and
streams output. Defaults target 8 × H100 (num_gpus=8,
tp_size=2, rollout_gpus_per_engine=2); tune them for your
cluster.
from agentcore_rl_toolkit.backends.slime import SlimeRunner
SlimeRunner( exp_id="gsm8k-3b-smoke", agent_runtime_arn="arn:aws:bedrock-agentcore:...", s3_bucket="your-bucket-name", model_dir="/path/to/Qwen2.5-3B-Instruct", data_path="/path/to/gsm8k_tiny.jsonl", model_type="qwen2.5-3B",).train(num_rollout=1) # 1 = smoke test; bump to 100 for a real runWandb — set WANDB_API_KEY and WANDB_ENTITY in your
environment (plus wandb_project / wandb_group on the
constructor) to log a run. Unset env vars skip wandb entirely.
Config-file workflow — dump kwargs to YAML and call
SlimeRunner.from_yaml("my_run.yaml") instead.
SlimeRunner exposes every field most experiments tune (cluster
shape, training hyperparameters, per-rollout ACR limits,
extra_flags for extra arguments to be directly passed to
SLIME) as constructor arguments.
See the
API reference or
help(SlimeRunner) for the full list.