a2rl.WiDataFrame.add_value_for_multi_episode_process#

WiDataFrame.add_value_for_multi_episode_process(sarsa=True, alpha=0.1, gamma=0.6, value_col='value', episode_identifier='episode', override='replace')[source]#

Append column value_col into this dataframe (restriction: df must NOT contain column names _state, _action, _reward, and the value_col).

Parameters:
  • sarsa (bool) – When True, compute the value using the SARSA Bellman equation which is a conservative on-policy temporal difference update. When False, use the Q-Learning Bellman equation which is an off-policy temporal difference update.

  • alpha (float) – Learning rate in Q-Learning and SARSA. Must be be within 0 and 1.

  • gamma (float) – Discount factor of future reward in Q-Learning and SARSA. Must be within 0 and 1.

  • value_col (str) – The column name for the computed values.

  • override (Literal['replace', 'warn', 'error']) – What to do when this dataframe has had column value_col. Valid values are replace to silently override, warn to show a warning, and raise to raise a ValueError.

  • episode_identifier (str) – group-by key in the this dataframe. Ensure that breaks BETWEEN episodes are tagged with a 0 group name.

Return type:

WiDataFrame

Returns:

This dataframe, modified with an additional value_col column. This return value is provided to facilitate chaining as-per the functional programming style.