{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Properties\n",
    "\n",
    "\n",
    "For many sequential decision making problems we look for some key patterns in the data\n",
    "\n",
    "* Markov property\n",
    "* A consistent reward or cost\n",
    "* Actions being effective in contributing to the reward or affecting the Environment\n",
    "* Seeing if there is a consistent way that actions are picked\n",
    "\n",
    "We have a few helper visualisations to help these are `markovian_matrix` and\n",
    "`normalized_markovian_matrix`.\n",
    "\n",
    "**Pre-requisite**: this example requires stable-baselines3. To quickly install this library, you may\n",
    "uncoment and execute the next cell. Note that the term `'gym>=...'` prevents `stable-baselines3`\n",
    "from downgrading `gym` to a version incompatible with `a2rl`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# %pip install stable-baselines3 'gym>=0.23.1,<0.26.0'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "import my_nb_path  # isort: skip\n",
    "import a2rl as wi  # isort: skip\n",
    "import os\n",
    "\n",
    "import gym\n",
    "from IPython.display import Markdown\n",
    "from stable_baselines3 import A2C, DQN, SAC\n",
    "from stable_baselines3.common.base_class import BaseAlgorithm\n",
    "\n",
    "import a2rl.nbtools  # Enable color outputs when rich is installed.\n",
    "from a2rl.utils import NotMDPDataError, assert_mdp, data_generator_simple, plot_information"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Helper Functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def discretized_sample_dataset(dataset_name: str, n_bins=50) -> wi.WiDataFrame:\n",
    "    \"\"\"Discretized a sample dataset.\n",
    "\n",
    "    Args:\n",
    "        dataset_name: name of the sample dataset.\n",
    "\n",
    "    Returns:\n",
    "        Whatif dataframe.\n",
    "\n",
    "    See Also\n",
    "    --------\n",
    "    list_sample_datasets\n",
    "    \"\"\"\n",
    "    dirname = wi.sample_dataset_path(dataset_name)\n",
    "    tokeniser = wi.DiscreteTokenizer(n_bins=n_bins)\n",
    "    df = tokeniser.fit_transform(wi.read_csv_dataset(dirname))\n",
    "    return df\n",
    "\n",
    "\n",
    "def data_generator_gym(\n",
    "    env_name: str = \"Taxi-v3\",\n",
    "    trainer: type[BaseAlgorithm] = A2C,\n",
    "    training_steps: int = 10000,\n",
    "    capture_steps: int = 1000,\n",
    ") -> wi.WiDataFrame:\n",
    "    \"\"\"Generate a :class:`a2rl.WiDataFrame` from any well-defined OpenAi gym.\n",
    "    An agent is trained first for ``training_steps``. Then, capture ``capture_steps`` from the\n",
    "    trained agent.\n",
    "    Args:\n",
    "        env_name: Name of the gym environment.\n",
    "        trainer: An underlying generator algorithm that supports discrete actions, such as\n",
    "            :class:`stable_baselines3.dqn.DQN` or :class:`stable_baselines3.a2c.A2C`. Raise an error\n",
    "            when passing a trainer that does not support discrete actions, such as\n",
    "            :class:`stable_baselines3.sac.SAC`.\n",
    "        training_steps: The number of steps to train the generator.\n",
    "        capture_steps: The number of steps to capture.\n",
    "    Returns:\n",
    "        A2RL data frame.\n",
    "    \"\"\"\n",
    "    env = gym.make(env_name, render_mode=None)\n",
    "    model = trainer(policy=\"MlpPolicy\", env=env, verbose=False)\n",
    "    model.learn(total_timesteps=training_steps)\n",
    "\n",
    "    cap_env = wi.TransitionRecorder(env)\n",
    "    model.set_env(cap_env)\n",
    "    model.learn(total_timesteps=capture_steps)\n",
    "\n",
    "    tokeniser = wi.DiscreteTokenizer(n_bins=50)\n",
    "    df = tokeniser.fit_transform(cap_env.df)\n",
    "\n",
    "    return df\n",
    "\n",
    "\n",
    "def test_gym_generator():\n",
    "    import pytest\n",
    "\n",
    "    gym_data = data_generator_gym(env_name=\"Taxi-v3\", trainer=DQN)\n",
    "    assert isinstance(gym_data, wi.WiDataFrame)\n",
    "\n",
    "    with pytest.raises(AssertionError, match=r\"Discrete(.*) was provided\"):\n",
    "        gym_data = data_generator_gym(env_name=\"MountainCar-v0\", trainer=SAC)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lags = 10  # Same as assert_mdp()'s default.\n",
    "\n",
    "################################################################################\n",
    "# To run in fast mode, set env var NOTEBOOK_FAST_RUN=1 prior to starting Jupyter\n",
    "################################################################################\n",
    "if os.environ.get(\"NOTEBOOK_FAST_RUN\", \"0\") != \"0\":\n",
    "    lags = 5\n",
    "    display(\n",
    "        Markdown(\n",
    "            '<p style=\"color:firebrick; background-color:yellow; font-weight:bold\">'\n",
    "            \"NOTE: notebook runs in fast mode. Use only 5 lags. Results may differ.\"\n",
    "        )\n",
    "    )\n",
    "################################################################################"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Inspection\n",
    "\n",
    "In the offline setting we are restricted only to data. A2RL offers three ways to generate some:\n",
    "\n",
    "1. The load-and-discretize workflow <- The main one. See `discretized_sample_dataset()`.\n",
    "\n",
    "2. `data_generator_gym` to load data interations between a trained agent and a gym environment <- This is for testing and research\n",
    "\n",
    "3. `data_generator_simple` to generate sample data with different properties <- Also for testing and research"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Synthetic Data\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create Markov property and then add random actions (random policy) that affect the states. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "offline_data = data_generator_simple(\n",
    "    markov_order=1,\n",
    "    reward_function=False,\n",
    "    action_effect=True,\n",
    "    policy=False,\n",
    ")\n",
    "\n",
    "try:\n",
    "    assert_mdp(offline_data, lags=lags)\n",
    "except NotMDPDataError as e:\n",
    "    print(\"Continue this example despite MDP check errors:\\n\", e)\n",
    "\n",
    "plot_information(offline_data, lags=lags);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use higher order Markov property and effective actions, and add a reward function that is related to the state and action"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbsphinx-thumbnail"
    ]
   },
   "outputs": [],
   "source": [
    "offline_data = data_generator_simple(\n",
    "    markov_order=2,\n",
    "    reward_function=True,\n",
    "    action_effect=True,\n",
    "    policy=False,\n",
    ")\n",
    "\n",
    "try:\n",
    "    assert_mdp(offline_data, lags=lags)\n",
    "except NotMDPDataError as e:\n",
    "    print(\"Continue this example despite MDP check errors:\\n\", e)\n",
    "\n",
    "plot_information(offline_data, lags=lags);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### OpenAi gym environment with known MDP\n",
    "\n",
    "Use an agent that is not trained very much on Taxi dataset and see how's the data looks like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "from stable_baselines3 import DQN\n",
    "\n",
    "gym_data = data_generator_gym(\n",
    "    env_name=\"Taxi-v3\",\n",
    "    trainer=DQN,\n",
    "    training_steps=10000,\n",
    "    capture_steps=100,\n",
    ")\n",
    "\n",
    "try:\n",
    "    assert_mdp(offline_data, lags=lags)\n",
    "except NotMDPDataError as e:\n",
    "    print(\"Continue this example despite MDP check errors:\\n\", e)\n",
    "\n",
    "plot_information(gym_data, lags=lags);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Chiller Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "df_chiller = discretized_sample_dataset(\"chiller\", n_bins=10)\n",
    "try:\n",
    "    assert_mdp(df_chiller, lags=lags)\n",
    "except NotMDPDataError as e:\n",
    "    print(\"Continue this example despite MDP check errors:\\n\", e)\n",
    "\n",
    "plot_information(df_chiller, lags=lags);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  },
  "toc-autonumbering": true
 },
 "nbformat": 4,
 "nbformat_minor": 4
}