Prepare Agent for RL
Preparing an Agent for RL training is straightforward. The side-by-side below shows a deployed Bedrock AgentCore agent and the RL-adapted version — all four examples in this repo (math, AppWorld, migration, OfficeBench) follow this pattern.
Before vs After
Section titled “Before vs After”A math agent that uses Bedrock AgentCore Runtime and the Strands Agents framework.
from bedrock_agentcore.runtime import BedrockAgentCoreAppfrom strands import Agentfrom strands.models import BedrockModelfrom strands_tools import calculator
app = BedrockAgentCoreApp()
@app.entrypointdef invoke_agent(payload): model = BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514-v1:0") agent = Agent( model=model, tools=[calculator], system_prompt="Solve the math problem step by step.", )
user_input = payload.get("prompt") response = agent(user_input) return response.message["content"][0]["text"]
if __name__ == "__main__": app.run()Added/changed lines to support Agentic RL are highlighted. Everything else — tools, system prompt, and agent logic — are unchanged.
from agentcore_rl_toolkit import AgentCoreRLAppfrom strands import Agentfrom strands.models.openai import OpenAIModelfrom strands_tools import calculatorfrom reward import MyReward
app = AgentCoreRLApp()reward_fn = MyReward()
# Switches handler from "return text sync" to "fire-and-forget, persist# the result." Without it the trainer can't receive rollout artifacts# after the HTTP call returns.@app.rollout_entrypointdef invoke_agent(payload: dict): # Training backend injects _rollout per request so each rollout # can target a different inference endpoint (vLLM/SGLang serving # the *current* policy weights, not a non-trainable model API). cfg = payload["_rollout"] model = OpenAIModel( client_args={"api_key": "EMPTY", "base_url": cfg["base_url"]}, model_id=cfg["model_id"], params=cfg.get("sampling_params", {}), ) agent = Agent( model=model, tools=[calculator], system_prompt="Solve the math problem step by step.", )
response = agent(payload["prompt"])
# reward_fn is a callable with flexible input — pass whatever # context it needs (response text, ground truth, env state, etc.). # Return a dict: RL needs a scalar (or per-turn vector) signal. # `rewards` is the key training backends read. rewards = reward_fn( response_text=response.message["content"][0]["text"], ground_truth=payload["answer"], ) return {"rewards": rewards}
if __name__ == "__main__": app.run()The Reward Function
Section titled “The Reward Function”from agentcore_rl_toolkit import RewardFunction
class MyReward(RewardFunction): def __call__(self, **kwargs) -> float: # Compute a scalar from response_text, ground_truth, # test_tracker, repo_dir, or whatever context you need. return rewardDeploy to AgentCore Runtime
Section titled “Deploy to AgentCore Runtime”Once the code is adapted, deploy with the
bedrock-agentcore-starter-toolkit
CLI. A minimal deploy:
agentcore configure \ --entrypoint rl_app.py \ --name my_rl_agent \ --requirements-file pyproject.toml \ --deployment-type container \ --non-interactiveagentcore deploy --agent my_rl_agentA successful deploy prints the agent runtime ARN — that’s the
one thing the training backend needs to reach this agent. Pair it
with your S3 bucket name (where rewards will land) and you have
the full set of hand-off values for the training backend. For example, they drop directly
into SlimeRunner (from
agentcore_rl_toolkit.backends.slime.SlimeRunner):
SlimeRunner( agent_runtime_arn="arn:aws:bedrock-agentcore:us-west-2:...", s3_bucket="your-bucket-name", ...,).train(...)For a worked end-to-end deploy (IAM role, ECR push, first-time AWS setup), see the Strands Math Agent example. For containerized or advanced deploys (custom Dockerfiles, direct code deployment, non-CLI workflows), see AWS’s Get started with AgentCore Runtime and its custom / without-CLI path.
Pick a training backend: slime (self-hosted)· rllm (managed + self-hosted)· verl (self-hosted).