Examples#

To optimize sequential decision makings purely from historical data, A2RL adopts the following key concepts:

  1. Historical data

  2. Problem formulation: Markovian test

  3. Simulator

  4. Agents

Historical Data#

Load your historical data into a a2rl.WiDataFrame of states, actions, and rewards columns.

Markovian Test#

Quantify the MDP-ness of the historical data, to determine whether your problem is best formulated as an MDP problem where a state can be predicted from its previous states and actions, or a multi-bandit problem where the simulator shows past decision are not good predictors for future states.

Simulator#

Train a a2rl.Simulator that’s able to predict what happens when an action is taken on a certain state: what’s the reward for taking the action, and what’s the next state once the action is performed.

Blog Series#