a2rl.Simulator.sample#
- Simulator.sample(seq, max_size=3, as_token=False, correct_unseen_token=True)[source]#
Given a batch of context, perform one step sampling for actions and rewards.
Example:
Input:
seq = [[1,2], [3,4]] max_size = 2
Output:
wi.WiDataFrame([] [10, 11], # From context [1,2] [12, 13], # From context [1,2] [20, 21], # From context [3,4] [22, 23], # From context [3,4] ])
- Parameters:
seq (
ndarray
) – a batch of context(s, a, r, ..., s)
. Must end with states dataframe token. Shape is(batch_size, context_length)
. Ifcontext_length
is greater thanAutoTokenizer.block_size
, then this input sequence will be silently trimmed to(batch_size, block_size)
.max_size (
int
) – Number of samples to return.as_token (
bool
) – whether the returned dataframe should be in tokenized format, or in the original value space (approximated).correct_unseen_token (
bool
) – Map unseen token to the closest valid token when True.
- Return type:
- Returns:
Whatif dataframe where each row is a sample with actions and rewards columns. The
as_token
determines whether the dataframe contents are tokenized or in the original value space (approximated).Shape is
(batch_size * max_size, ...)
. Starting with the 1st context’s actions, followed by the context’s actions and so on.
Note
Ensure the correct context sequence
(s, a, r, ..., s)
is passed in.Return
max_size
of sampling for each context. Result may not be unique.Each rows of return result represent actions, rewards and values.