a2rl.Simulator#
- class a2rl.Simulator(tokenizer, model, max_steps=100, reset_coldstart=2, test_mode=True)[source]#
-
This is a Simulator class that can provide recommendation for an action, and the associated value, given the current context.
The simulator is to be used together with the
TokenizerandGPTBuildertrained model during instantiation.- Parameters:
tokenizer (
AutoTokenizer) –AutoTokenizerinstance.model (
GPT) – Trained model fromGPTBuildermax_steps (
int) – Number of steps per episode.reset_coldstart (
int) – Number of dataframe context rows.test_mode (
bool) – When True, reset current rows to dataframe index zero.
Examples
This example show how to get a recommendation using a simple dataset.
First by loading the data and generate value column. Refer to
WiDataFrame.>>> import numpy as np >>> import pandas as pd >>> import a2rl as wi >>> >>> df = pd.DataFrame( ... np.array( ... [ ... [0, 10, 20, 200], ... [1, 12, 21, 225], ... [2, 15, 22, 237], ... ] ... ), ... columns=["s1", "s2", "a", "r"], ... ) >>> wi_df = wi.WiDataFrame(df, states=["s1", "s2"], actions=["a"], rewards=["r"]) >>> wi_df.add_value() s1 s2 a r value 0 0 10 20 200 184... 1 1 12 21 225 154... 2 2 15 22 237 0...
Next create a
AutoTokenizerusing the dataframe, indicating the desired block size in term of number of rows. You can get discretized dataframe token viaAutoTokenizerproperties.>>> field_tokenizer = wi.DiscreteTokenizer(num_bins_strategy="uniform") >>> tokenizer = wi.AutoTokenizer(wi_df, block_size_row=1, field_tokenizer=field_tokenizer) >>> tokenizer.df_tokenized s1 s2 a r value 0 0 100 200 300 499 1 50 140 250 367 483 2 99 199 299 399 400
Train a GPT model using :class:GPTBuilder by passing in the
AutoTokenizer, andmodel_dirandmodel_name.>>> import tempfile >>> with tempfile.TemporaryDirectory() as model_dir: ... builder = wi.GPTBuilder(tokenizer, model_dir) ... builder.fit() GPT(...)
Get a recommendation by giving a context, and perform
max_sizenumber of sampling.Note
The context is in row major order, MUST be in the format of
(s,a,r,...,s)ending with states, in discretized dataframe tokens.In this example, the context
[0, 100, 200, 300, 499, 50, 140]represents[s1, s2, a, r, value, s1, s2]>>> simulator = wi.Simulator(tokenizer, builder.model) >>> custom_context = np.array([0,100,200,300,499,50,140]) >>> rec_df = simulator.sample(custom_context, max_size=2)
And finally pick an action that corresponding to the minimum or maximum of value column depending on your objective.
>>> rec_df a r value 0 21.01 224.975 106.057972 1 21.01 224.975 106.057972
Methods
beam_search_n_steps(seq, n_steps, beam_width)This function largely replaces A2RL
Simulator.gpt_sample_n_steps().get_valid_actions(seq, max_size)Return a dataframe of sampled action tokens, given the input context.
gpt_sample(seq, cur_col_index[, sample])Predict the next GPT token given the input GPT tokens.
gpt_sample_n_steps(seq, n_steps, start_col_index)Given a GPT token sequence, sample the next
n_stepsof GPT tokens.lookahead(seq, action[, correct_unseen_token])Given a batch of context, and a batch of actions, simulates the expected rewards and next states for all combination of contexts and actions.
reset(**kwargs)Plaecholder.
sample(seq[, max_size, as_token, ...])Given a batch of context, perform one step sampling for actions and rewards.
step(action)Placeholder.
Attributes