a2rl.Simulator#

class a2rl.Simulator(tokenizer, model, max_steps=100, reset_coldstart=2, test_mode=True)[source]#

Bases: Env[ndarray, list]

This is a Simulator class that can provide recommendation for an action, and the associated value, given the current context.

The simulator is to be used together with the Tokenizer and GPTBuilder trained model during instantiation.

Parameters:

tokenizer (AutoTokenizer) – AutoTokenizer instance.
model (GPT) – Trained model from GPTBuilder
max_steps (int) – Number of steps per episode.
reset_coldstart (int) – Number of dataframe context rows.
test_mode (bool) – When True, reset current rows to dataframe index zero.

Examples

This example show how to get a recommendation using a simple dataset.

First by loading the data and generate value column. Refer to WiDataFrame.

>>> import numpy as np
>>> import pandas as pd
>>> import a2rl as wi
>>>
>>> df = pd.DataFrame(
...     np.array(
...         [
...             [0, 10, 20, 200],
...             [1, 12, 21, 225],
...             [2, 15, 22, 237],
...         ]
...     ),
...     columns=["s1", "s2", "a", "r"],
... )
>>> wi_df = wi.WiDataFrame(df, states=["s1", "s2"], actions=["a"], rewards=["r"])
>>> wi_df.add_value()  
   s1  s2   a    r  value
0   0  10  20  200  184...
1   1  12  21  225  154...
2   2  15  22  237    0...

Next create a AutoTokenizer using the dataframe, indicating the desired block size in term of number of rows. You can get discretized dataframe token via AutoTokenizer properties.

>>> field_tokenizer = wi.DiscreteTokenizer(num_bins_strategy="uniform")
>>> tokenizer = wi.AutoTokenizer(wi_df, block_size_row=1, field_tokenizer=field_tokenizer)
>>> tokenizer.df_tokenized
   s1   s2    a    r  value
0   0  100  200  300    499
1  50  140  250  367    483
2  99  199  299  399    400

Train a GPT model using :class:GPTBuilder by passing in the AutoTokenizer, and model_dir and model_name.

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as model_dir:
...     builder = wi.GPTBuilder(tokenizer, model_dir)
...     builder.fit()
GPT(...)

Get a recommendation by giving a context, and perform max_size number of sampling.

Note

The context is in row major order, MUST be in the format of (s,a,r,...,s) ending with states, in discretized dataframe tokens.

In this example, the context [0, 100, 200, 300, 499, 50, 140] represents [s1, s2, a, r, value, s1, s2]

>>> simulator = wi.Simulator(tokenizer, builder.model)
>>> custom_context = np.array([0,100,200,300,499,50,140])
>>> rec_df = simulator.sample(custom_context, max_size=2)

And finally pick an action that corresponding to the minimum or maximum of value column depending on your objective.

>>> rec_df 
       a        r       value
0  21.01  224.975  106.057972
1  21.01  224.975  106.057972

Methods

`beam_search_n_steps`(seq, n_steps, beam_width)	This function largely replaces A2RL `Simulator.gpt_sample_n_steps()`.
`get_valid_actions`(seq, max_size)	Return a dataframe of sampled action tokens, given the input context.
`gpt_sample`(seq, cur_col_index[, sample])	Predict the next GPT token given the input GPT tokens.
`gpt_sample_n_steps`(seq, n_steps, start_col_index)	Given a GPT token sequence, sample the next `n_steps` of GPT tokens.
`lookahead`(seq, action[, correct_unseen_token])	Given a batch of context, and a batch of actions, simulates the expected rewards and next states for all combination of contexts and actions.
`reset`(**kwargs)	Plaecholder.
`sample`(seq[, max_size, as_token, ...])	Given a batch of context, perform one step sampling for actions and rewards.
`step`(action)	Placeholder.

Attributes

current_context