a2rl.Simulator#
- class a2rl.Simulator(tokenizer, model, max_steps=100, reset_coldstart=2, test_mode=True)[source]#
-
This is a Simulator class that can provide recommendation for an action, and the associated value, given the current context.
The simulator is to be used together with the
Tokenizer
andGPTBuilder
trained model during instantiation.- Parameters:
tokenizer (
AutoTokenizer
) –AutoTokenizer
instance.model (
GPT
) – Trained model fromGPTBuilder
max_steps (
int
) – Number of steps per episode.reset_coldstart (
int
) – Number of dataframe context rows.test_mode (
bool
) – When True, reset current rows to dataframe index zero.
Examples
This example show how to get a recommendation using a simple dataset.
First by loading the data and generate value column. Refer to
WiDataFrame
.>>> import numpy as np >>> import pandas as pd >>> import a2rl as wi >>> >>> df = pd.DataFrame( ... np.array( ... [ ... [0, 10, 20, 200], ... [1, 12, 21, 225], ... [2, 15, 22, 237], ... ] ... ), ... columns=["s1", "s2", "a", "r"], ... ) >>> wi_df = wi.WiDataFrame(df, states=["s1", "s2"], actions=["a"], rewards=["r"]) >>> wi_df.add_value() s1 s2 a r value 0 0 10 20 200 184... 1 1 12 21 225 154... 2 2 15 22 237 0...
Next create a
AutoTokenizer
using the dataframe, indicating the desired block size in term of number of rows. You can get discretized dataframe token viaAutoTokenizer
properties.>>> field_tokenizer = wi.DiscreteTokenizer(num_bins_strategy="uniform") >>> tokenizer = wi.AutoTokenizer(wi_df, block_size_row=1, field_tokenizer=field_tokenizer) >>> tokenizer.df_tokenized s1 s2 a r value 0 0 100 200 300 499 1 50 140 250 367 483 2 99 199 299 399 400
Train a GPT model using :class:GPTBuilder by passing in the
AutoTokenizer
, andmodel_dir
andmodel_name
.>>> import tempfile >>> with tempfile.TemporaryDirectory() as model_dir: ... builder = wi.GPTBuilder(tokenizer, model_dir) ... builder.fit() GPT(...)
Get a recommendation by giving a context, and perform
max_size
number of sampling.Note
The context is in row major order, MUST be in the format of
(s,a,r,...,s)
ending with states, in discretized dataframe tokens.In this example, the context
[0, 100, 200, 300, 499, 50, 140]
represents[s1, s2, a, r, value, s1, s2]
>>> simulator = wi.Simulator(tokenizer, builder.model) >>> custom_context = np.array([0,100,200,300,499,50,140]) >>> rec_df = simulator.sample(custom_context, max_size=2)
And finally pick an action that corresponding to the minimum or maximum of value column depending on your objective.
>>> rec_df a r value 0 21.01 224.975 106.057972 1 21.01 224.975 106.057972
Methods
beam_search_n_steps
(seq, n_steps, beam_width)This function largely replaces A2RL
Simulator.gpt_sample_n_steps()
.get_valid_actions
(seq, max_size)Return a dataframe of sampled action tokens, given the input context.
gpt_sample
(seq, cur_col_index[, sample])Predict the next GPT token given the input GPT tokens.
gpt_sample_n_steps
(seq, n_steps, start_col_index)Given a GPT token sequence, sample the next
n_steps
of GPT tokens.lookahead
(seq, action[, correct_unseen_token])Given a batch of context, and a batch of actions, simulates the expected rewards and next states for all combination of contexts and actions.
reset
(**kwargs)Plaecholder.
sample
(seq[, max_size, as_token, ...])Given a batch of context, perform one step sampling for actions and rewards.
step
(action)Placeholder.
Attributes