a2rl.WiDataFrame.add_value_for_multi_episode_process#
- WiDataFrame.add_value_for_multi_episode_process(sarsa=True, alpha=0.1, gamma=0.6, value_col='value', episode_identifier='episode', override='replace')[source]#
Append column
value_col
into this dataframe (restriction:df
must NOT contain column names_state
,_action
,_reward
, and thevalue_col
).- Parameters:
sarsa (
bool
) – WhenTrue
, compute the value using the SARSA Bellman equation which is a conservative on-policy temporal difference update. WhenFalse
, use the Q-Learning Bellman equation which is an off-policy temporal difference update.alpha (
float
) – Learning rate in Q-Learning and SARSA. Must be be within 0 and 1.gamma (
float
) – Discount factor of future reward in Q-Learning and SARSA. Must be within 0 and 1.value_col (
str
) – The column name for the computed values.override (
Literal
['replace'
,'warn'
,'error'
]) – What to do when this dataframe has had columnvalue_col
. Valid values arereplace
to silently override,warn
to show a warning, andraise
to raise aValueError
.episode_identifier (
str
) – group-by key in the this dataframe. Ensure that breaks BETWEEN episodes are tagged with a0
group name.
- Return type:
- Returns:
This dataframe, modified with an additional
value_col
column. This return value is provided to facilitate chaining as-per the functional programming style.