Strands AppWorld Agent
An RL-trainable AppWorld agent using Bedrock AgentCore RL Toolkit. The agent solves day-to-day tasks by interacting with simulated app APIs (Spotify, Venmo, Gmail, etc.) via a Python REPL.
The agent uses a ReAct code agent pattern: a single execute
tool runs Python code inside an AppWorld sandbox. APIs are
discovered on-demand through the pre-loaded apis object, so tool
context stays small regardless of how many APIs exist.
Quickstart
Section titled “Quickstart”cd examples/strands_appworld_agentuv venv --python 3.12 && source .venv/bin/activateUV_GIT_LFS=1 uv pip install -e .uv pip install -e ../../ --force-reinstall --no-depsappworld install && appworld download data
# Start the RL app (needs a local vLLM server running on :4000)python rl_app.pyDeploy via the
Prepare agent for RL → Deploy
flow, then evaluate with evaluate.py or train with
SlimeRunner.
What’s in the example
Section titled “What’s in the example”rl_app.py— the rollout entrypoint; spins up anAppWorldServers+AppWorldcontext per-session and binds theexecutetool as a closure over the task’sworld.reward.py—AppWorldReward: pass-rate from the task’stest_tracker.evaluate.py— async batch evaluation over the AppWorld splits (train/dev/test_normal/test_challenge).few_shot_example.py— few-shot demo messages seeded into the agent’s context.Dockerfile+deploy.py+config.toml— container build + programmatic ACR deploy.
For AppWorld data download, Docker + ECR setup, full VPC deploy config, and evaluation options, see the full README on GitHub.