{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# How to do Back Testing?\n", "\n", "This notebook demonstrate how to perform back testing using `backtest` utility." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "%load_ext autoreload\n", "%autoreload 2\n", "\n", "\n", "import matplotlib.pyplot as plt\n", "\n", "import a2rl as wi\n", "from a2rl.nbtools import pprint, print # Enable color outputs when rich is installed.\n", "from a2rl.utils import backtest" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "- Specify 0.8 ratio of data for training. The first 80% of dataframe rows starting from index 0 will be used for training.\n", "- There are 3992 rows for training, 998 rows for test" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "wi_df = wi.read_csv_dataset(wi.sample_dataset_path(\"chiller\"))\n", "wi_df.add_value()\n", "\n", "# Speed up training for demo purpose\n", "wi_df = wi_df.iloc[:1000]\n", "tokenizer = wi.AutoTokenizer(wi_df, block_size_row=2, train_ratio=0.8)\n", "print(f\"Train: {len(tokenizer.train_dataset)}, Test: {len(tokenizer.test_dataset)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Train the model. In this example, we are going to train the model using `1 epoch` to speed up, you may need to adjust training configuration for your own use case." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_dir = \"model-backtest\"\n", "\n", "config = dict(\n", " epochs=1,\n", " batch_size=512,\n", " embedding_dim=512,\n", " gpt_n_layer=1,\n", " gpt_n_head=1,\n", " learning_rate=6e-4,\n", " num_workers=0,\n", " lr_decay=True,\n", ")\n", "config = {\"train_config\": config}\n", "\n", "builder = wi.GPTBuilder(tokenizer, model_dir, config)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "model = builder.fit()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Back Test\n", "\n", "- Prepare backtest data using a subset of test data. In this case rows with index `-910:-900` which fall within test dataset.\n", "- Let's create a new dataframe assuming it is come hold out set, and then tokenized the dataframe using existing tokenizer.\n", "- Since you have trained the model, you can access your tokenizer from `tokenizer` directly. Alternatively, you can get from `simulator.tokenizer`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "simulator = wi.Simulator(tokenizer, model)\n", "test_df = wi_df.iloc[-910:-900].reset_index(drop=True)\n", "display(test_df)\n", "\n", "test_df_tokenized = tokenizer.field_tokenizer.transform(test_df)\n", "display(test_df_tokenized)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Let's use the first 2 rows as context, and have `backtest` function predict the next 8 rows.\n", "- `true_df` is a convenient groundtruth dataframe returned to be used for comparison." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pred_df, true_df = backtest(\n", " test_df, simulator, start_row=0, context_rows=2, predict_rows=8, return_groudtruth=True\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The number of rows returned by backtest is `context_rows + predict_rows`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pred_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now you can compare the states transitoin between simulator and groundtruth based on historical actions." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "fig, axes = plt.subplots(int(len(true_df.states) / 2), 2, figsize=(15, 5))\n", "fig.suptitle(\"Back Testing for states\", fontsize=16)\n", "\n", "for idx, col in enumerate(true_df.states):\n", " true_df[col].plot(ax=axes[idx])\n", " pred_df[col].plot(ax=axes[idx])\n", " axes[idx].set_title(col)\n", " axes[idx].legend([\"true\", \"pred\"])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "vscode": { "interpreter": { "hash": "22f92e4608f34d3393fc5e7884f8906c6794e2d0198ea9b43992c442775a4328" } } }, "nbformat": 4, "nbformat_minor": 4 }