a2rl.DiscreteTokenizer.fit#
- DiscreteTokenizer.fit(df, check=True)[source]#
Fit the quantizer for the numeric columns, and the label encoder for the categorical columns.
- Parameters:
df (
WiDataFrame
) – Training data.check (
bool
) – WhenTrue
, ensure thatdf
contains sufficient variance (i.e., a column must not have just a single value), and numerical columns contains only finite values.
- Return type:
- Returns:
This fitted discrete tokenizer.
- Raises:
ValueError – when
check=True
and violations found on input data.
See also
check_numerical_columns
Checks performed on numerical columns.
check_categorical_columns
Checks performed on categorical columns.
Examples
Fitting a dataframe with enough variance (i.e., more than one unique values).
>>> import a2rl as wi >>> from a2rl.utils import tokenize >>> >>> wi_df = wi.read_csv_dataset(wi.sample_dataset_path("chiller")).trim() >>> wi_df.nunique() condenser_inlet_temp 70 evaporator_heat_load_rt 5279 staging 11 system_power_consumption 5354 dtype: int64 >>> tok = wi.DiscreteTokenizer().fit(wi_df)
An example of fitting a dataframe with not enough variance. In this example, the training data has just one single action.
>>> df_constant_action = wi_df.head().copy() >>> df_constant_action["staging"] = "0" >>> df_constant_action.nunique() condenser_inlet_temp 5 evaporator_heat_load_rt 5 staging 1 system_power_consumption 5 dtype: int64 >>> wi.DiscreteTokenizer().fit(df_constant_action) Traceback (most recent call last): ValueError: Single numerical values detected on columns ['staging']