a2rl.DiscreteTokenizer.fit#

DiscreteTokenizer.fit(df, check=True)[source]#

Fit the quantizer for the numeric columns, and the label encoder for the categorical columns.

Parameters:

df (WiDataFrame) – Training data.
check (bool) – When True, ensure that df contains sufficient variance (i.e., a column must not have just a single value), and numerical columns contains only finite values.

Return type:

DiscreteTokenizer

Returns:

This fitted discrete tokenizer.

Raises:

ValueError – when check=True and violations found on input data.

See also

check_numerical_columns: Checks performed on numerical columns.
check_categorical_columns: Checks performed on categorical columns.

Examples

Fitting a dataframe with enough variance (i.e., more than one unique values).

>>> import a2rl as wi
>>> from a2rl.utils import tokenize
>>>
>>> wi_df = wi.read_csv_dataset(wi.sample_dataset_path("chiller")).trim()
>>> wi_df.nunique()  

condenser_inlet_temp          70
evaporator_heat_load_rt     5279
staging                       11
system_power_consumption    5354
dtype: int64

>>> tok = wi.DiscreteTokenizer().fit(wi_df)

An example of fitting a dataframe with not enough variance. In this example, the training data has just one single action.

>>> df_constant_action = wi_df.head().copy()
>>> df_constant_action["staging"] = "0"
>>> df_constant_action.nunique()  

condenser_inlet_temp        5
evaporator_heat_load_rt     5
staging                     1
system_power_consumption    5
dtype: int64

>>> wi.DiscreteTokenizer().fit(df_constant_action)  
Traceback (most recent call last):
ValueError: Single numerical values detected on columns ['staging']