a2rl.Simulator.sample#
- Simulator.sample(seq, max_size=3, as_token=False, correct_unseen_token=True)[source]#
- Given a batch of context, perform one step sampling for actions and rewards. - Example: - Input: - seq = [[1,2], [3,4]] max_size = 2 - Output: - wi.WiDataFrame([] [10, 11], # From context [1,2] [12, 13], # From context [1,2] [20, 21], # From context [3,4] [22, 23], # From context [3,4] ]) - Parameters:
- seq ( - ndarray) – a batch of context- (s, a, r, ..., s). Must end with states dataframe token. Shape is- (batch_size, context_length). If- context_lengthis greater than- AutoTokenizer.block_size, then this input sequence will be silently trimmed to- (batch_size, block_size).
- max_size ( - int) – Number of samples to return.
- as_token ( - bool) – whether the returned dataframe should be in tokenized format, or in the original value space (approximated).
- correct_unseen_token ( - bool) – Map unseen token to the closest valid token when True.
 
- Return type:
- Returns:
- Whatif dataframe where each row is a sample with actions and rewards columns. The - as_tokendetermines whether the dataframe contents are tokenized or in the original value space (approximated).- Shape is - (batch_size * max_size, ...). Starting with the 1st context’s actions, followed by the context’s actions and so on.
 - Note - Ensure the correct context sequence - (s, a, r, ..., s)is passed in.
- Return - max_sizeof sampling for each context. Result may not be unique.
- Each rows of return result represent actions, rewards and values.