Examples#
To optimize sequential decision makings purely from historical data, A2RL adopts the following key concepts:
Historical data
Problem formulation: Markovian test
Simulator
Agents
Historical Data#
Load your historical data into a a2rl.WiDataFrame
of states, actions, and rewards
columns.
Markovian Test#
Quantify the MDP-ness of the historical data, to determine whether your problem is best formulated as an MDP problem where a state can be predicted from its previous states and actions, or a multi-bandit problem where the simulator shows past decision are not good predictors for future states.
Simulator#
Train a a2rl.Simulator
that’s able to predict what happens when an action is taken on a
certain state: what’s the reward for taking the action, and what’s the next state once the action is
performed.
Blog Series#
Part-1: Underfloor Heating Optimisation using Offline Reinforcement Learning
Part-2: Offline Reinforcement Learning with A2RL on Amazon SageMaker (coming soon)