{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Properties\n", "\n", "\n", "For many sequential decision making problems we look for some key patterns in the data\n", "\n", "* Markov property\n", "* A consistent reward or cost\n", "* Actions being effective in contributing to the reward or affecting the Environment\n", "* Seeing if there is a consistent way that actions are picked\n", "\n", "We have a few helper visualisations to help these are `markovian_matrix` and\n", "`normalized_markovian_matrix`.\n", "\n", "**Pre-requisite**: this example requires stable-baselines3. To quickly install this library, you may\n", "uncoment and execute the next cell. Note that the term `'gym>=...'` prevents `stable-baselines3`\n", "from downgrading `gym` to a version incompatible with `a2rl`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# %pip install stable-baselines3 'gym>=0.23.1,<0.26.0'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%matplotlib inline\n", "%load_ext autoreload\n", "%autoreload 2\n", "\n", "import my_nb_path # isort: skip\n", "import a2rl as wi # isort: skip\n", "import os\n", "\n", "import gym\n", "from IPython.display import Markdown\n", "from stable_baselines3 import A2C, DQN, SAC\n", "from stable_baselines3.common.base_class import BaseAlgorithm\n", "\n", "import a2rl.nbtools # Enable color outputs when rich is installed.\n", "from a2rl.utils import NotMDPDataError, assert_mdp, data_generator_simple, plot_information" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Helper Functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def discretized_sample_dataset(dataset_name: str, n_bins=50) -> wi.WiDataFrame:\n", " \"\"\"Discretized a sample dataset.\n", "\n", " Args:\n", " dataset_name: name of the sample dataset.\n", "\n", " Returns:\n", " Whatif dataframe.\n", "\n", " See Also\n", " --------\n", " list_sample_datasets\n", " \"\"\"\n", " dirname = wi.sample_dataset_path(dataset_name)\n", " tokeniser = wi.DiscreteTokenizer(n_bins=n_bins)\n", " df = tokeniser.fit_transform(wi.read_csv_dataset(dirname))\n", " return df\n", "\n", "\n", "def data_generator_gym(\n", " env_name: str = \"Taxi-v3\",\n", " trainer: type[BaseAlgorithm] = A2C,\n", " training_steps: int = 10000,\n", " capture_steps: int = 1000,\n", ") -> wi.WiDataFrame:\n", " \"\"\"Generate a :class:`a2rl.WiDataFrame` from any well-defined OpenAi gym.\n", " An agent is trained first for ``training_steps``. Then, capture ``capture_steps`` from the\n", " trained agent.\n", " Args:\n", " env_name: Name of the gym environment.\n", " trainer: An underlying generator algorithm that supports discrete actions, such as\n", " :class:`stable_baselines3.dqn.DQN` or :class:`stable_baselines3.a2c.A2C`. Raise an error\n", " when passing a trainer that does not support discrete actions, such as\n", " :class:`stable_baselines3.sac.SAC`.\n", " training_steps: The number of steps to train the generator.\n", " capture_steps: The number of steps to capture.\n", " Returns:\n", " A2RL data frame.\n", " \"\"\"\n", " env = gym.make(env_name, render_mode=None)\n", " model = trainer(policy=\"MlpPolicy\", env=env, verbose=False)\n", " model.learn(total_timesteps=training_steps)\n", "\n", " cap_env = wi.TransitionRecorder(env)\n", " model.set_env(cap_env)\n", " model.learn(total_timesteps=capture_steps)\n", "\n", " tokeniser = wi.DiscreteTokenizer(n_bins=50)\n", " df = tokeniser.fit_transform(cap_env.df)\n", "\n", " return df\n", "\n", "\n", "def test_gym_generator():\n", " import pytest\n", "\n", " gym_data = data_generator_gym(env_name=\"Taxi-v3\", trainer=DQN)\n", " assert isinstance(gym_data, wi.WiDataFrame)\n", "\n", " with pytest.raises(AssertionError, match=r\"Discrete(.*) was provided\"):\n", " gym_data = data_generator_gym(env_name=\"MountainCar-v0\", trainer=SAC)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lags = 10 # Same as assert_mdp()'s default.\n", "\n", "################################################################################\n", "# To run in fast mode, set env var NOTEBOOK_FAST_RUN=1 prior to starting Jupyter\n", "################################################################################\n", "if os.environ.get(\"NOTEBOOK_FAST_RUN\", \"0\") != \"0\":\n", " lags = 5\n", " display(\n", " Markdown(\n", " '

'\n", " \"NOTE: notebook runs in fast mode. Use only 5 lags. Results may differ.\"\n", " )\n", " )\n", "################################################################################" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Inspection\n", "\n", "In the offline setting we are restricted only to data. A2RL offers three ways to generate some:\n", "\n", "1. The load-and-discretize workflow <- The main one. See `discretized_sample_dataset()`.\n", "\n", "2. `data_generator_gym` to load data interations between a trained agent and a gym environment <- This is for testing and research\n", "\n", "3. `data_generator_simple` to generate sample data with different properties <- Also for testing and research" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Synthetic Data\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create Markov property and then add random actions (random policy) that affect the states. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "offline_data = data_generator_simple(\n", " markov_order=1,\n", " reward_function=False,\n", " action_effect=True,\n", " policy=False,\n", ")\n", "\n", "try:\n", " assert_mdp(offline_data, lags=lags)\n", "except NotMDPDataError as e:\n", " print(\"Continue this example despite MDP check errors:\\n\", e)\n", "\n", "plot_information(offline_data, lags=lags);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use higher order Markov property and effective actions, and add a reward function that is related to the state and action" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "nbsphinx-thumbnail" ] }, "outputs": [], "source": [ "offline_data = data_generator_simple(\n", " markov_order=2,\n", " reward_function=True,\n", " action_effect=True,\n", " policy=False,\n", ")\n", "\n", "try:\n", " assert_mdp(offline_data, lags=lags)\n", "except NotMDPDataError as e:\n", " print(\"Continue this example despite MDP check errors:\\n\", e)\n", "\n", "plot_information(offline_data, lags=lags);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### OpenAi gym environment with known MDP\n", "\n", "Use an agent that is not trained very much on Taxi dataset and see how's the data looks like." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%%time\n", "from stable_baselines3 import DQN\n", "\n", "gym_data = data_generator_gym(\n", " env_name=\"Taxi-v3\",\n", " trainer=DQN,\n", " training_steps=10000,\n", " capture_steps=100,\n", ")\n", "\n", "try:\n", " assert_mdp(offline_data, lags=lags)\n", "except NotMDPDataError as e:\n", " print(\"Continue this example despite MDP check errors:\\n\", e)\n", "\n", "plot_information(gym_data, lags=lags);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Chiller Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%%time\n", "\n", "df_chiller = discretized_sample_dataset(\"chiller\", n_bins=10)\n", "try:\n", " assert_mdp(df_chiller, lags=lags)\n", "except NotMDPDataError as e:\n", " print(\"Continue this example despite MDP check errors:\\n\", e)\n", "\n", "plot_information(df_chiller, lags=lags);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "toc-autonumbering": true }, "nbformat": 4, "nbformat_minor": 4 }