a2rl.WiDataFrame.add_value_for_multi_episode_process#
- WiDataFrame.add_value_for_multi_episode_process(sarsa=True, alpha=0.1, gamma=0.6, value_col='value', episode_identifier='episode', override='replace')[source]#
Append column
value_colinto this dataframe (restriction:dfmust NOT contain column names_state,_action,_reward, and thevalue_col).- Parameters:
sarsa (
bool) – WhenTrue, compute the value using the SARSA Bellman equation which is a conservative on-policy temporal difference update. WhenFalse, use the Q-Learning Bellman equation which is an off-policy temporal difference update.alpha (
float) – Learning rate in Q-Learning and SARSA. Must be be within 0 and 1.gamma (
float) – Discount factor of future reward in Q-Learning and SARSA. Must be within 0 and 1.value_col (
str) – The column name for the computed values.override (
Literal['replace','warn','error']) – What to do when this dataframe has had columnvalue_col. Valid values arereplaceto silently override,warnto show a warning, andraiseto raise aValueError.episode_identifier (
str) – group-by key in the this dataframe. Ensure that breaks BETWEEN episodes are tagged with a0group name.
- Return type:
- Returns:
This dataframe, modified with an additional
value_colcolumn. This return value is provided to facilitate chaining as-per the functional programming style.