Prepare Agent for RL

Preparing an Agent for RL training is straightforward. The side-by-side below shows a deployed Bedrock AgentCore agent and the RL-adapted version — all four examples in this repo (math, AppWorld, migration, OfficeBench) follow this pattern.

Before vs After

Before
After — adapted for RL

A math agent that uses Bedrock AgentCore Runtime and the Strands Agents framework.

from bedrock_agentcore.runtime import BedrockAgentCoreApp
from strands import Agent
from strands.models import BedrockModel
from strands_tools import calculator

app = BedrockAgentCoreApp()


@app.entrypoint
def invoke_agent(payload):
    model = BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514-v1:0")
    agent = Agent(
        model=model,
        tools=[calculator],
        system_prompt="Solve the math problem step by step.",
    )

    user_input = payload.get("prompt")
    response = agent(user_input)
    return response.message["content"][0]["text"]


if __name__ == "__main__":
    app.run()

Added/changed lines to support Agentic RL are highlighted. Everything else — tools, system prompt, and agent logic — are unchanged.

from agentcore_rl_toolkit import AgentCoreRLApp
from strands import Agent
from strands.models.openai import OpenAIModel
from strands_tools import calculator
from reward import MyReward

app = AgentCoreRLApp()
reward_fn = MyReward()


# Switches handler from "return text sync" to "fire-and-forget, persist
# the result." Without it the trainer can't receive rollout artifacts
# after the HTTP call returns.
@app.rollout_entrypoint
def invoke_agent(payload: dict):
    # Training backend injects _rollout per request so each rollout
    # can target a different inference endpoint (vLLM/SGLang serving
    # the *current* policy weights, not a non-trainable model API).
    cfg = payload["_rollout"]
    model = OpenAIModel(
        client_args={"api_key": "EMPTY", "base_url": cfg["base_url"]},
        model_id=cfg["model_id"],
        params=cfg.get("sampling_params", {}),
    )
    agent = Agent(
        model=model,
        tools=[calculator],
        system_prompt="Solve the math problem step by step.",
    )

    response = agent(payload["prompt"])

    # reward_fn is a callable with flexible input — pass whatever
    # context it needs (response text, ground truth, env state, etc.).
    # Return a dict: RL needs a scalar (or per-turn vector) signal.
    # `rewards` is the key training backends read.
    rewards = reward_fn(
        response_text=response.message["content"][0]["text"],
        ground_truth=payload["answer"],
    )
    return {"rewards": rewards}


if __name__ == "__main__":
    app.run()

The Reward Function

from agentcore_rl_toolkit import RewardFunction


class MyReward(RewardFunction):
    def __call__(self, **kwargs) -> float:
        # Compute a scalar from response_text, ground_truth,
        # test_tracker, repo_dir, or whatever context you need.
        return reward

Deploy to AgentCore Runtime

Once the code is adapted, deploy with the bedrock-agentcore-starter-toolkit CLI. A minimal deploy:

agentcore configure \
    --entrypoint rl_app.py \
    --name my_rl_agent \
    --requirements-file pyproject.toml \
    --deployment-type container \
    --non-interactive
agentcore deploy --agent my_rl_agent

A successful deploy prints the agent runtime ARN — that’s the one thing the training backend needs to reach this agent. Pair it with your S3 bucket name (where rewards will land) and you have the full set of hand-off values for the training backend. For example, they drop directly into SlimeRunner (from agentcore_rl_toolkit.backends.slime.SlimeRunner):

SlimeRunner(
    agent_runtime_arn="arn:aws:bedrock-agentcore:us-west-2:...",
    s3_bucket="your-bucket-name",
    ...,
).train(...)

For a worked end-to-end deploy (IAM role, ECR push, first-time AWS setup), see the Strands Math Agent example. For containerized or advanced deploys (custom Dockerfiles, direct code deployment, non-CLI workflows), see AWS’s Get started with AgentCore Runtime and its custom / without-CLI path.

Pick a training backend: slime (self-hosted)· rllm (managed + self-hosted)· verl (self-hosted).

Prepare Agent for RL

Before vs After

The Reward Function

Deploy to AgentCore Runtime

Next