Examples#

To optimize sequential decision makings purely from historical data, A2RL adopts the following key concepts:

Historical data
Problem formulation: Markovian test
Simulator
Agents

Historical Data#

Load your historical data into a a2rl.WiDataFrame of states, actions, and rewards columns.

Data Frames

Create Whatif Dataset

Markovian Test#

Quantify the MDP-ness of the historical data, to determine whether your problem is best formulated as an MDP problem where a state can be predicted from its previous states and actions, or a multi-bandit problem where the simulator shows past decision are not good predictors for future states.

Data Properties

Simulator#

Train a a2rl.Simulator that’s able to predict what happens when an action is taken on a certain state: what’s the reward for taking the action, and what’s the next state once the action is performed.

Simulator

How to do Back Testing?

BYO Planner

Loon Example

Blog Series#

Part-1: Underfloor Heating Optimisation using Offline Reinforcement Learning
Part-2: Offline Reinforcement Learning with A2RL on Amazon SageMaker (coming soon)