Download this page as backtest.ipynb (right-click, and save as).

How to do Back Testing?#

This notebook demonstrate how to perform back testing using backtest utility.

[1]:

%matplotlib inline
%load_ext autoreload
%autoreload 2


import matplotlib.pyplot as plt

import a2rl as wi
from a2rl.nbtools import pprint, print  # Enable color outputs when rich is installed.
from a2rl.utils import backtest

/opt/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages/lightning_fabric/__init__.py:36: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('lightning_fabric')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  __import__("pkg_resources").declare_namespace(__name__)
/opt/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages/pytorch_lightning/__init__.py:36: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('pytorch_lightning')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  __import__("pkg_resources").declare_namespace(__name__)
/opt/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages/nptyping/typing_.py:51: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
  Bool8 = np.bool8
/opt/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages/nptyping/typing_.py:54: DeprecationWarning: `np.object0` is a deprecated alias for ``np.object0` is a deprecated alias for `np.object_`. `object` can be used instead.  (Deprecated NumPy 1.24)`.  (Deprecated NumPy 1.24)
  Object0 = np.object0
/opt/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages/nptyping/typing_.py:66: DeprecationWarning: `np.int0` is a deprecated alias for `np.intp`.  (Deprecated NumPy 1.24)
  Int0 = np.int0
/opt/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages/nptyping/typing_.py:80: DeprecationWarning: `np.uint0` is a deprecated alias for `np.uintp`.  (Deprecated NumPy 1.24)
  UInt0 = np.uint0
/opt/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages/nptyping/typing_.py:107: DeprecationWarning: `np.void0` is a deprecated alias for `np.void`.  (Deprecated NumPy 1.24)
  Void0 = np.void0
/opt/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages/nptyping/typing_.py:112: DeprecationWarning: `np.bytes0` is a deprecated alias for `np.bytes_`.  (Deprecated NumPy 1.24)
  Bytes0 = np.bytes0
/opt/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages/nptyping/typing_.py:114: DeprecationWarning: `np.str0` is a deprecated alias for `np.str_`.  (Deprecated NumPy 1.24)
  Str0 = np.str0

Setup#

Specify 0.8 ratio of data for training. The first 80% of dataframe rows starting from index 0 will be used for training.
There are 3992 rows for training, 998 rows for test

[2]:

wi_df = wi.read_csv_dataset(wi.sample_dataset_path("chiller"))
wi_df.add_value()

# Speed up training for demo purpose
wi_df = wi_df.iloc[:1000]
tokenizer = wi.AutoTokenizer(wi_df, block_size_row=2, train_ratio=0.8)
print(f"Train: {len(tokenizer.train_dataset)}, Test: {len(tokenizer.test_dataset)}")

/opt/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:279: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 0 are removed. Consider decreasing the number of bins.
  warnings.warn(

Train: 3992, Test: 998

/opt/hostedtoolcache/Python/3.10.11/x64/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:279: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 0 are removed. Consider decreasing the number of bins.
  warnings.warn(

Train the model. In this example, we are going to train the model using 1 epoch to speed up, you may need to adjust training configuration for your own use case.

[3]:

model_dir = "model-backtest"

config = dict(
    epochs=1,
    batch_size=512,
    embedding_dim=512,
    gpt_n_layer=1,
    gpt_n_head=1,
    learning_rate=6e-4,
    num_workers=0,
    lr_decay=True,
)
config = {"train_config": config}

builder = wi.GPTBuilder(tokenizer, model_dir, config)

[4]:

%%time
model = builder.fit()

2023-05-22 10:06:39.174 | INFO     | a2rl.simulator:fit:753 - {'epochs': 1, 'batch_size': 512, 'embedding_dim': 512, 'gpt_n_layer': 1, 'gpt_n_head': 1, 'learning_rate': 0.0006, 'num_workers': 0, 'lr_decay': True}

2023-05-22 10:06:43.693 | INFO     | a2rl.mingpt.trainer:run_epoch:123 - test loss: 5.617769479751587
2023-05-22 10:06:43.695 | INFO     | a2rl.simulator:fit:787 - Training time in mins: 0.074

CPU times: user 8.15 s, sys: 478 ms, total: 8.63 s
Wall time: 4.54 s

Back Test#

Prepare backtest data using a subset of test data. In this case rows with index -910:-900 which fall within test dataset.
Let’s create a new dataframe assuming it is come hold out set, and then tokenized the dataframe using existing tokenizer.
Since you have trained the model, you can access your tokenizer from tokenizer directly. Alternatively, you can get from simulator.tokenizer.

[5]:

simulator = wi.Simulator(tokenizer, model)
test_df = wi_df.iloc[-910:-900].reset_index(drop=True)
display(test_df)

test_df_tokenized = tokenizer.field_tokenizer.transform(test_df)
display(test_df_tokenized)

	timestamp	staging	condenser_inlet_temp	evaporator_heat_load_rt	system_power_consumption	value
0	2025-08-04 18:00:00	8	28.6	716.6	973.7	713.670159
1	2025-08-04 19:00:00	4	27.2	767.8	632.8	1274.284315
2	2025-08-04 20:00:00	1	29.5	1436.0	1001.4	1272.469218
3	2025-08-04 21:00:00	7	28.0	1200.5	1344.8	1031.874230
4	2025-08-04 22:00:00	10	28.2	811.4	1085.7	1283.630440
5	2025-08-04 23:00:00	9	27.7	803.8	824.9	916.735348
6	2025-08-05 00:00:00	4	30.1	852.4	1014.9	1006.722570
7	2025-08-05 01:00:00	8	28.6	631.5	1035.2	1331.785970
8	2025-08-05 02:00:00	3	31.0	970.5	1024.1	1206.725221
9	2025-08-05 03:00:00	7	26.8	896.5	520.0	1418.093945

	condenser_inlet_temp	evaporator_heat_load_rt	staging	system_power_consumption	value
0	17	68	352	216	252
1	3	76	348	156	326
2	26	142	344	220	326
3	11	138	351	242	303
4	13	85	345	232	328
5	7	83	353	186	284
6	32	93	348	223	300
7	17	57	352	226	333
8	40	116	347	224	320
9	1	102	351	147	338

Let’s use the first 2 rows as context, and have backtest function predict the next 8 rows.
true_df is a convenient groundtruth dataframe returned to be used for comparison.

[6]:

pred_df, true_df = backtest(
    test_df, simulator, start_row=0, context_rows=2, predict_rows=8, return_groudtruth=True
)

2023-05-22 10:06:43.808 | INFO     | a2rl.utils:backtest:122 - Initial context.shape=(1, 7)
2023-05-22 10:06:43.809 | INFO     | a2rl.utils:backtest:129 - Predicting row:1, curr_row_idx=1
2023-05-22 10:06:43.810 | DEBUG    | a2rl.utils:backtest:131 - hist_action=array([[348]])
2023-05-22 10:06:43.822 | DEBUG    | a2rl.utils:backtest:133 - reward=array([[223, 342]]), next_states=array([[17, 52]])
2023-05-22 10:06:43.823 | DEBUG    | a2rl.utils:backtest:136 - new_context.shape=(1, 12)
2023-05-22 10:06:43.824 | INFO     | a2rl.utils:backtest:129 - Predicting row:2, curr_row_idx=2
2023-05-22 10:06:43.825 | DEBUG    | a2rl.utils:backtest:131 - hist_action=array([[344]])
2023-05-22 10:06:43.836 | DEBUG    | a2rl.utils:backtest:133 - reward=array([[165, 244]]), next_states=array([[ 33, 109]])
2023-05-22 10:06:43.837 | DEBUG    | a2rl.utils:backtest:136 - new_context.shape=(1, 17)
2023-05-22 10:06:43.837 | INFO     | a2rl.utils:backtest:129 - Predicting row:3, curr_row_idx=3
2023-05-22 10:06:43.838 | DEBUG    | a2rl.utils:backtest:131 - hist_action=array([[351]])
2023-05-22 10:06:43.849 | DEBUG    | a2rl.utils:backtest:133 - reward=array([[226, 340]]), next_states=array([[28, 67]])
2023-05-22 10:06:43.850 | DEBUG    | a2rl.utils:backtest:136 - new_context.shape=(1, 22)
2023-05-22 10:06:43.851 | INFO     | a2rl.utils:backtest:129 - Predicting row:4, curr_row_idx=4
2023-05-22 10:06:43.852 | DEBUG    | a2rl.utils:backtest:131 - hist_action=array([[345]])
2023-05-22 10:06:43.863 | DEBUG    | a2rl.utils:backtest:133 - reward=array([[231, 305]]), next_states=array([[11, 78]])
2023-05-22 10:06:43.864 | DEBUG    | a2rl.utils:backtest:136 - new_context.shape=(1, 27)
2023-05-22 10:06:43.864 | INFO     | a2rl.utils:backtest:129 - Predicting row:5, curr_row_idx=5
2023-05-22 10:06:43.865 | DEBUG    | a2rl.utils:backtest:131 - hist_action=array([[353]])
2023-05-22 10:06:43.876 | DEBUG    | a2rl.utils:backtest:133 - reward=array([[232, 317]]), next_states=array([[ 21, 136]])
2023-05-22 10:06:43.877 | DEBUG    | a2rl.utils:backtest:136 - new_context.shape=(1, 32)
2023-05-22 10:06:43.878 | INFO     | a2rl.utils:backtest:129 - Predicting row:6, curr_row_idx=6
2023-05-22 10:06:43.879 | DEBUG    | a2rl.utils:backtest:131 - hist_action=array([[348]])
2023-05-22 10:06:43.890 | DEBUG    | a2rl.utils:backtest:133 - reward=array([[208, 294]]), next_states=array([[17, 69]])
2023-05-22 10:06:43.891 | DEBUG    | a2rl.utils:backtest:136 - new_context.shape=(1, 37)
2023-05-22 10:06:43.891 | INFO     | a2rl.utils:backtest:129 - Predicting row:7, curr_row_idx=7
2023-05-22 10:06:43.893 | DEBUG    | a2rl.utils:backtest:131 - hist_action=array([[352]])
2023-05-22 10:06:43.908 | DEBUG    | a2rl.utils:backtest:133 - reward=array([[156, 335]]), next_states=array([[ 35, 112]])
2023-05-22 10:06:43.909 | DEBUG    | a2rl.utils:backtest:136 - new_context.shape=(1, 42)
2023-05-22 10:06:43.910 | INFO     | a2rl.utils:backtest:129 - Predicting row:8, curr_row_idx=8
2023-05-22 10:06:43.911 | DEBUG    | a2rl.utils:backtest:131 - hist_action=array([[347]])
2023-05-22 10:06:43.921 | DEBUG    | a2rl.utils:backtest:133 - reward=array([[158, 342]]), next_states=array([[17, 87]])
2023-05-22 10:06:43.922 | DEBUG    | a2rl.utils:backtest:136 - new_context.shape=(1, 47)
2023-05-22 10:06:43.923 | INFO     | a2rl.utils:backtest:129 - Predicting row:9, curr_row_idx=9
2023-05-22 10:06:43.924 | DEBUG    | a2rl.utils:backtest:131 - hist_action=array([[351]])
2023-05-22 10:06:43.934 | DEBUG    | a2rl.utils:backtest:133 - reward=array([[145, 248]]), next_states=array([[ 13, 140]])
2023-05-22 10:06:43.935 | DEBUG    | a2rl.utils:backtest:136 - new_context.shape=(1, 52)
2023-05-22 10:06:43.936 | DEBUG    | a2rl.utils:backtest:142 - new_sequence.shape=(50,)

The number of rows returned by backtest is context_rows + predict_rows.

[7]:

pred_df

[7]:

	condenser_inlet_temp	evaporator_heat_load_rt	staging	system_power_consumption	value
0	28.650	715.5245	8	974.8800	712.317305
1	27.250	765.0835	4	1017.2585	1615.877422
2	28.650	576.6105	1	713.3945	581.756826
3	30.205	933.2330	7	1036.2240	1453.694064
4	29.750	711.0155	10	1079.8065	1053.205005
5	28.050	774.8440	9	1090.9105	1174.986647
6	29.050	1150.3535	4	930.9185	975.234735
7	28.650	721.4465	8	635.8025	1357.274873
8	30.304	952.4120	3	654.7580	1615.877422
9	28.650	816.0875	7	481.4470	660.180949

Now you can compare the states transitoin between simulator and groundtruth based on historical actions.

[8]:

fig, axes = plt.subplots(int(len(true_df.states) / 2), 2, figsize=(15, 5))
fig.suptitle("Back Testing for states", fontsize=16)

for idx, col in enumerate(true_df.states):
    true_df[col].plot(ax=axes[idx])
    pred_df[col].plot(ax=axes[idx])
    axes[idx].set_title(col)
    axes[idx].legend(["true", "pred"])

../_images/auto-notebooks_backtest_14_0.svg

Download this page as backtest.ipynb (right-click, and save as).

	condenser_inlet_temp	evaporator_heat_load_rt	staging	system_power_consumption	value
0	17	68	352	216	252
1	3	76	348	156	326
2	26	142	344	220	326
3	11	138	351	242	303
4	13	85	345	232	328
5	7	83	353	186	284
6	32	93	348	223	300
7	17	57	352	226	333
8	40	116	347	224	320
9	1	102	351	147	338

	condenser_inlet_temp	evaporator_heat_load_rt	staging	system_power_consumption	value
0	17	68	352	216	252
1	3	76	348	156	326
2	26	142	344	220	326
3	11	138	351	242	303
4	13	85	345	232	328
5	7	83	353	186	284
6	32	93	348	223	300
7	17	57	352	226	333
8	40	116	347	224	320
9	1	102	351	147	338

	condenser_inlet_temp	evaporator_heat_load_rt	staging	system_power_consumption	value
0	17	68	352	216	252
1	3	76	348	156	326
2	26	142	344	220	326
3	11	138	351	242	303
4	13	85	345	232	328
5	7	83	353	186	284
6	32	93	348	223	300
7	17	57	352	226	333
8	40	116	347	224	320
9	1	102	351	147	338