{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Loon Example\n", "\n", "## Background\n", "\n", "This example shows how `Whatif` might be used to control flying high-altitude balloons for wireless communications. Please check details of the public [Loon Stratospheric Sensor Data](https://zenodo.org/record/5119968). Previously, Google and Loon researchers have used Reinforcement Learning to [efficiently navigate balloons in the stratosphere](https://www.nature.com/articles/s41586-020-2939-8). The basic idea is that the wind fields vary at different altitude, and by moving the ballon **vertically** between winds at different altitudes, one can keep the balloon at its desirable **horizontal** location for effective communications.\n", "\n", "For the purpose of this demo, we extracted 128 hours of the flight `LN-191` data from the file [loon-flights-2021-Q2.csv.gz](https://zenodo.org/record/5119968/files/loon-flights-2021Q2.csv.gz?download=1) to train `Whatif`. In particular, this example shows how easy it is to use `Whatif` to:\n", "\n", "- Behaviour cloning (altitude control actions) from data so we keep balloons on the current trajectory\n", "- Speculate counter-factual trajectories given alternative actions (thus, answering the \"**What if**\" question)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%matplotlib inline\n", "%load_ext autoreload\n", "%autoreload 2\n", "\n", "import my_nb_path # isort: skip\n", "\n", "import os\n", "import random\n", "from pathlib import Path\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import smopy\n", "import torch\n", "from IPython.display import Markdown\n", "from tqdm.autonotebook import tqdm\n", "\n", "import a2rl as wi\n", "from a2rl.nbtools import print # Enable color outputs when rich is installed.\n", "\n", "# Default to fast mode for faster runs.\n", "os.environ[\"NOTEBOOK_FAST_RUN\"] = \"1\"\n", "\n", "# Misc. settings\n", "plt.rcParams[\"figure.figsize\"] = [10, 8]\n", "RAN_SEED = 42\n", "random.seed(RAN_SEED)\n", "np.random.seed(RAN_SEED)\n", "_ = torch.manual_seed(RAN_SEED)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Optional: Download and Preprocess Dataset\n", "\n", "The next cell creates `loon/metadata.yaml` and `loon/data.csv` **only if** they are unavailable on\n", "your disk. It extracts the first 10,000 rows representing 128 hours of the flight LN-191 data from\n", "[loon-flights-2021-Q2.csv.gz](https://zenodo.org/record/5119968/files/loon-flights-2021Q2.csv.gz?download=1).\n", "Depending on your internet connection, it may take a while for `pandas` to read the 35.9MB source\n", "`.csv.gz` file over https.\n", "\n", "- Dataset homepage: \n", "- Source data file: \n", "- Name: Loon Stratospheric Sensor Data\n", "- License (for files): Creative Commons Attribution 4.0 International" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%%time\n", "loon_dataset = Path(\"loon\")\n", "loon_dataset.mkdir(exist_ok=True)\n", "\n", "if not (loon_dataset / \"metadata.yaml\").exists():\n", " metadata = wi.Metadata(\n", " states=[\n", " \"altitude\",\n", " \"temperature\",\n", " \"pressure\",\n", " \"earth_ir\",\n", " \"velocity_u\",\n", " \"velocity_v\",\n", " \"omega\",\n", " \"acceleration\",\n", " \"solar_elevation\",\n", " \"solar_azimuth\",\n", " ],\n", " actions=[\"acs\", \"is_daytime\"],\n", " rewards=[\"dist_to_station\"],\n", " )\n", " wi.save_metadata(metadata, loon_dataset / \"metadata.yaml\")\n", " del metadata\n", "\n", "if not (loon_dataset / \"data.csv\").exists():\n", "\n", " def create_loon_data_csv(output_path=loon_dataset / \"data.csv\"):\n", " from sklearn.metrics.pairwise import haversine_distances\n", "\n", " usecols = [\n", " \"time\",\n", " \"latitude\",\n", " \"longitude\",\n", " \"altitude\",\n", " \"temperature\",\n", " \"pressure\",\n", " \"earth_ir\",\n", " \"earth_ir_sensor_config\",\n", " \"acs\",\n", " \"velocity_u\",\n", " \"velocity_v\",\n", " \"omega\",\n", " \"acceleration\",\n", " \"solar_elevation\",\n", " \"solar_azimuth\",\n", " \"is_daytime\",\n", " ]\n", " df = pd.read_csv(\n", " \"https://zenodo.org/record/5119968/files/loon-flights-2021Q2.csv.gz?download=1\",\n", " usecols=usecols,\n", " dtype=str, # To make saved output uses the same formats as the source data.\n", " low_memory=False,\n", " compression=\"gzip\",\n", " )\n", " df.rename(columns={\"time\": \"timestamp\"}, inplace=True)\n", " df = df.head(10000)\n", " df.fillna(method=\"ffill\", inplace=True)\n", "\n", " # Convert \"2021-04-01T00:00:15.000Z\" to timestamp (in milliseconds).\n", " df[\"timestamp\"] = pd.to_datetime(df[\"timestamp\"]).apply(lambda x: x.timestamp() * 1000)\n", "\n", " # Add dist_to_station, which is the distance between each row and the initial position (row 0).\n", " # The harvisine distance() returns a distance matrix, but we just use the 1st row which contains\n", " # distances of each row to the 1st row.\n", " earth_circle = 6371\n", " locs = df[[\"latitude\", \"longitude\"]].values.astype(float)\n", " df[\"dist_to_station\"] = haversine_distances(np.radians(locs))[0, :] * earth_circle\n", "\n", " df.to_csv(output_path, index=False)\n", "\n", " create_loon_data_csv()\n", "\n", "del loon_dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "BLOCK_SIZE_ROW = 4 # block_size (measured by # of rows) as the context to train GPT\n", "wi_df = wi.read_csv_dataset(\"loon\")\n", "wi_df.add_value()\n", "\n", "################################################################################\n", "# To run in fast mode, set env var NOTEBOOK_FAST_RUN=1 prior to starting Jupyter\n", "################################################################################\n", "if os.environ.get(\"NOTEBOOK_FAST_RUN\", \"0\") != \"0\":\n", " wi_df = wi_df.head(450)\n", "\n", " display(\n", " Markdown(\n", " '

'\n", " \"NOTE: notebook runs in fast mode. Use only a fraction of data. Results may differ.\"\n", " )\n", " )\n", "################################################################################\n", "\n", "# Instantiate a tokenier given the selected dataset.\n", "field_tokenizer = wi.DiscreteTokenizer(num_bins_strategy=\"uniform\")\n", "tokenizer = wi.AutoTokenizer(wi_df, block_size_row=BLOCK_SIZE_ROW, field_tokenizer=field_tokenizer)\n", "tokenizer.df.head(2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tokenizer.df_tokenized.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check states, actions and rewards" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tokenizer.df.sar_d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following column definitions are adaped from the dataset [README file](https://zenodo.org/record/5119968/files/README.pdf?download=1)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| SAR type | Column | Description |\n", "| ----------- | ----------- | ----------- |\n", "| S | altitude | altitude in meters above mean sea level |\n", "| S | temperature | ambient temperature, in degrees Kelvin |\n", "| S | pressure | ambient atmospheric pressure, in hectopascals (hPa) |\n", "| S | earth_ir | upward long-wave radiative flux, in Watts per meter squared |\n", "| S | velocity_u | west-to-east balloon velocity, in meters per second |\n", "| S | velocity_v | south-to-north balloon velocity, in meters per second |\n", "| S | omega | Vertical velocity, in hectopascals/second |\n", "| S | acceleration | Average change in horizontal velocity in meters per second squared |\n", "| S | solar_elevation | Angle of sun, in degrees relative to horizontal |\n", "| S | solar_azimuth | Angle of sun, clockwise relative to North |\n", "| A | **acs** | Altitude Control System (ACS) command: 1 if ascending, -1 if descending, 0 otherwise |\n", "| A | **is_daytime** | 1 if the sun is not occluded by the horizon, otherwise 0 |\n", "| R | *dist_to_station* | distance (km) to the station, which is assumed to be the starting point of the flight |\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note we treat column `is_daytime` as an action even though it is not directly controllable by `Whatif`. This is a convenient way of incorporating exogenous variables into `Whatif`. Column `dist_to_station` is not in the original dataset, but derived based on the distance between the ballon's current position and its initial position. \n", "\n", "The optimization goal is to keep ballon within a short distance (e.g. 50 kilometres) from its station so that it can effectively communicate with a ground device. Therefore, the less the cost (the higher the negative reward) the better." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualize the dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "nbsphinx-thumbnail" ] }, "outputs": [], "source": [ "ac = wi_df[[\"longitude\", \"latitude\"]].values\n", "min_lat = ac[:, 1].min() #\n", "min_lon = ac[:, 0].min() - 25 #\n", "max_lat = ac[:, 1].max() #\n", "max_lon = ac[:, 0].max() #\n", "map = smopy.Map((min_lat, min_lon, max_lat, max_lon), z=5)\n", "ax = map.show_mpl(figsize=(10, 8), dpi=150)\n", "\n", "wi_xs, wi_ys = map.to_pixels(wi_df.latitude, wi_df.longitude)\n", "\n", "\n", "plt.scatter(wi_xs, wi_ys, s=5, color=\"gray\", alpha=1, label=\"Trajectory\")\n", "\n", "plt.plot(\n", " wi_xs[:1],\n", " wi_ys[:1],\n", " marker=\"o\",\n", " markersize=20,\n", " markerfacecolor=\"none\",\n", " markeredgecolor=\"green\",\n", " label=\"Start\",\n", ")\n", "plt.plot(\n", " wi_xs[-1:],\n", " wi_ys[-1:],\n", " marker=\"o\",\n", " markersize=20,\n", " markerfacecolor=\"none\",\n", " markeredgecolor=\"red\",\n", " label=\"End\",\n", ")\n", "\n", "\n", "plt.legend(fontsize=14)\n", "plt.title(\"Loon trajectory over the Indian Ocean\", fontsize=16)\n", "plt.xlabel(\"Longitude\", fontsize=12)\n", "_ = plt.ylabel(\"Latitude\", fontsize=12)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Load or Train the GPT model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_dir = \"model-loon\"\n", "epochs = 5\n", "\n", "################################################################################\n", "# To run in fast mode, set env var NOTEBOOK_FAST_RUN=1 prior to starting Jupyter\n", "################################################################################\n", "if os.environ.get(\"NOTEBOOK_FAST_RUN\", \"0\") != \"0\":\n", " epochs = 1\n", "\n", " display(\n", " Markdown(\n", " '

'\n", " \"NOTE: notebook runs in fast mode. Train for 1x epoch only. Results may differ.\"\n", " )\n", " )\n", "################################################################################\n", "\n", "config = {\n", " \"train_config\": {\n", " \"epochs\": epochs,\n", " \"batch_size\": 512,\n", " \"embedding_dim\": 512,\n", " \"gpt_n_layer\": 1,\n", " \"gpt_n_head\": 1,\n", " \"learning_rate\": 6e-4,\n", " \"num_workers\": 0,\n", " \"lr_decay\": True,\n", " }\n", "}\n", "builder = wi.GPTBuilder(tokenizer, model_dir, config)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start GPT model training.\n", "\n", "Default hyperparam is located at `src/a2rl/config.yaml`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# %%time\n", "model_fname = os.path.join(model_dir, builder.model_name)\n", "if os.path.exists(model_fname):\n", " print(f\"Will load the GPT model from {model_fname}\")\n", " builder.load_model()\n", "else:\n", " print(\"Training the GPT model\")\n", " builder.fit()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Instantiate the Simulator Instance\n", "To create a simulator, we need pass in the tokenzier and the GPT model wrapped inside `whatif.Simulator.GPTBuilder`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "simulator = wi.Simulator(tokenizer, builder.model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Planning Ballon Trajectory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preparation\n", "First we will find out the column names of SARS'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "col_names = tokenizer.df.sar_d" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dim_states = len(col_names[\"states\"])\n", "dim_states" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rewards_cols = tokenizer.df.sar_d[\"rewards\"]\n", "rewards_cols" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "action_cols = tokenizer.df.sar_d[\"actions\"]\n", "action_cols" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nb_actions = len(tokenizer.df_tokenized[action_cols[0]].unique())\n", "nb_actions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculate the total number of dataframe tokens per SAR" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sar_row_len = dim_states + len(rewards_cols) + len(action_cols)\n", "sar_row_len" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "block_size = BLOCK_SIZE_ROW * sar_row_len\n", "block_size" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def recover_raw_reward(custom_context, my_action, reward):\n", " \"\"\"\n", " return the untokenized (raw) values of both immediate_reward and reward_to_go (Q-value or SARSA)\n", " \"\"\"\n", " prev_states = custom_context[-dim_states:]\n", " seq = np.hstack([prev_states, my_action, reward])\n", " dft = tokenizer.from_seq_to_dataframe(seq)\n", " ttt = dft[rewards_cols[:]].loc[0].values\n", " return ttt[0], ttt[1], dft # 0 is immediate reward, 1 is the reward-to-go" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def create_new_context(custom_context, my_action, reward, next_states):\n", " \"\"\"\n", " Append new actions, reward and next_states to the current context\n", " in order to generate the new context\n", " \"\"\"\n", " return np.hstack([custom_context, my_action, reward, next_states])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Behaviour Cloning\n", "First we use the `a2rl.Simulator.sample` API obtain \"recommended\" actions based on the distribution learned from the dataset.\n", "We then apply selected actions to rollout the next step using the `a2rl.Simulator.lookahead` API.\n", "We do this Rollout for each step throughout the entire trajectory.\n", "\n", "We set the prediction horizon to 100 in order to limit the computation time of this example. If you run this notebook on a GPU machine, feel free to increase the horizon." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "horizon = 50 # set the planning horizon\n", "nb_runs = 1\n", "start_step = 2000\n", "\n", "################################################################################\n", "# To run in fast mode, set env var NOTEBOOK_FAST_RUN=1 prior to starting Jupyter\n", "################################################################################\n", "if os.environ.get(\"NOTEBOOK_FAST_RUN\", \"0\") != \"0\":\n", " start_step = 350\n", "\n", " display(\n", " Markdown(\n", " '

'\n", " \"NOTE: notebook runs in fast mode. Use only a fraction of data. Results may differ.\"\n", " )\n", " )\n", "################################################################################\n", "\n", "start_step = min(start_step, wi_df.shape[0] - 1)\n", "initial_simulation_context = tokenizer.df_tokenized.iloc[start_step : start_step + 1].values[0][:10]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "accum_cost_list = []\n", "non_accum_cost_list = []\n", "raw_state_list = []\n", "context_block_size = (\n", " block_size * BLOCK_SIZE_ROW\n", ") # make the context length longter than the block_size length\n", "for _ in range(nb_runs):\n", " accum_cost = [0]\n", " non_accum_cost = [0]\n", " custom_context = initial_simulation_context\n", " raw_states = []\n", " for i in tqdm(range(horizon)):\n", " # obtain a valid \"random\" action\n", " if len(custom_context) > block_size:\n", " truncated_custom_context = custom_context[-block_size:]\n", " else:\n", " truncated_custom_context = custom_context\n", " recommendation_df = simulator.sample(truncated_custom_context, max_size=1, as_token=True)\n", " my_action = recommendation_df[action_cols].loc[0].values\n", "\n", " # use lookahead to build up the context\n", " if len(custom_context) > context_block_size:\n", " truncated_custom_context = custom_context[-context_block_size:]\n", " else:\n", " truncated_custom_context = custom_context\n", " # print('len(truncated_custom_context) = ', len(truncated_custom_context))\n", " reward, next_states = simulator.lookahead(truncated_custom_context, list(my_action))\n", " immediate_cost, _, raw_state = recover_raw_reward(\n", " truncated_custom_context, my_action, reward\n", " )\n", " accum_cost.append(accum_cost[-1] + immediate_cost)\n", " non_accum_cost.append(immediate_cost)\n", " raw_states.append(raw_state)\n", " custom_context = create_new_context(custom_context, my_action, reward, next_states)\n", " # print('len(custom_context) = ', len(custom_context))\n", " accum_cost_list.append(accum_cost)\n", " non_accum_cost_list.append(non_accum_cost)\n", " raw_state_list.append(raw_states)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reconstruct the predicted trajectory " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sample = pd.concat(raw_state_list[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compute coordinates based on predicted velocities\n", "The current ballon locations (latitude and longitude coordinates) are not part of the states, but they can be calculated from previous location and the two velocities, which are states predicted by `Whatif simulator`. Doing this auto-regressively will reconstruct locations starting from the initial position til the end of the prediction horizon." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "avg_time_resolution = (\n", " (wi_df[\"timestamp\"][wi_df.shape[0] - 1] - wi_df[\"timestamp\"][0]) / wi_df.shape[0] / 1000\n", ") # millisecs to secs\n", "meters_in_degree = 0.001 / 111 # assuming great circle\n", "init_lon = wi_df[\"longitude\"][start_step] # initial location\n", "init_lat = wi_df[\"latitude\"][start_step]\n", "lons = [init_lon]\n", "lats = [init_lat]\n", "\n", "for speed_lon, speed_lat in zip(sample.velocity_u, sample.velocity_v):\n", " lons.append(\n", " lons[-1] + speed_lon * avg_time_resolution * meters_in_degree\n", " ) # convert distance from meters to degrees\n", " lats.append(lats[-1] + speed_lat * avg_time_resolution * meters_in_degree)\n", "\n", "sample[\"dist_u\"] = lons[1:]\n", "sample[\"dist_v\"] = lats[1:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot trajectories" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.scatter(\n", " wi_df[\"longitude\"][start_step - 500 : start_step],\n", " wi_df[\"latitude\"][start_step - 500 : start_step],\n", " s=15,\n", " color=\"green\",\n", " alpha=1,\n", " label=\"Past trajectory\",\n", ")\n", "\n", "plt.scatter(\n", " wi_df[\"longitude\"][start_step : start_step + 200],\n", " wi_df[\"latitude\"][start_step : start_step + 200],\n", " s=120,\n", " color=\"lightgray\",\n", " alpha=1,\n", " label=\"Future trajectory (ground-truth)\",\n", ")\n", "\n", "plt.scatter(\n", " sample[\"dist_u\"],\n", " sample[\"dist_v\"],\n", " s=15,\n", " color=\"none\",\n", " edgecolor=\"red\",\n", " label=\"Predicted trajectory (Whatif)\",\n", ")\n", "\n", "plt.legend(fontsize=14)\n", "plt.title(\"Loon trajectory on the map\", fontsize=16)\n", "plt.xlabel(\"Longitude\", fontsize=12)\n", "_ = plt.ylabel(\"Latitude\", fontsize=12)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot states (temperature and altitude) and action (ACS)\n", "\n", "We compare the states between reality and Whatif simulation, and we found them to be largely consistent. In particular, the action is identical." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axs = plt.subplots(3)\n", "axs[0].plot(wi_df[\"temperature\"][start_step : start_step + 100].values, label=\"Ground truth\")\n", "axs[0].plot(sample.temperature.values, label=\"Whatif\")\n", "axs[0].set_ylabel(\"Temperature\", fontsize=14)\n", "axs[0].legend(fontsize=14)\n", "axs[0].grid(ls=\"--\")\n", "axs[1].plot(wi_df[\"altitude\"][start_step : start_step + 100].values, label=\"Ground truth\")\n", "_ = axs[1].plot(sample.altitude.values, label=\"Whatif\")\n", "# axs[1].set_xlabel(\"Step\", fontsize=14)\n", "axs[1].set_ylabel(\"Altitude\", fontsize=14)\n", "axs[1].legend(fontsize=14)\n", "axs[1].grid(ls=\"--\")\n", "axs[2].plot(wi_df[\"acs\"][start_step : start_step + 100].values, label=\"Ground truth\")\n", "_ = axs[2].plot(sample.acs.values, label=\"Whatif\")\n", "_ = axs[2].set_ylim([-1, 1])\n", "axs[2].set_xlabel(\"Step\", fontsize=14)\n", "axs[2].set_ylabel(\"ACS control\", fontsize=14)\n", "axs[2].legend(fontsize=14)\n", "axs[2].grid(ls=\"--\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Counter Factual (Whatif?)\n", "From the above plot, we understand that the actual control action during the 100-step horizon has always been 0 (neither acending nor descending). But **what if** we send the \"descending\" control to the balloon? To explore this counter-factual effect, we force the control to `-1` (i.e. always descending), which corresponds to the token `1000`, and re-run the `Whatif` simulator as before (thus using the `lookahead` API to rollout the 100-step trajectory)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "accum_cost_list = []\n", "non_accum_cost_list = []\n", "raw_state_list = []\n", "context_block_size = block_size * BLOCK_SIZE_ROW\n", "hardcoded_action = np.array([1000, 1100]) # hardcode the control action\n", "for _ in range(nb_runs):\n", " accum_cost = [0]\n", " non_accum_cost = [0]\n", " custom_context = initial_simulation_context\n", " raw_states = []\n", " for i in tqdm(range(horizon)):\n", " # obtain a valid \"random\" action\n", " if len(custom_context) > block_size:\n", " truncated_custom_context = custom_context[-block_size:]\n", " else:\n", " truncated_custom_context = custom_context\n", "\n", " # use lookahead to build up the context\n", " if len(custom_context) > context_block_size:\n", " truncated_custom_context = custom_context[-context_block_size:]\n", " else:\n", " truncated_custom_context = custom_context\n", " # print('len(truncated_custom_context) = ', len(truncated_custom_context))\n", " reward, next_states = simulator.lookahead(truncated_custom_context, list(hardcoded_action))\n", " immediate_cost, _, raw_state = recover_raw_reward(\n", " truncated_custom_context, hardcoded_action, reward\n", " )\n", " accum_cost.append(accum_cost[-1] + immediate_cost)\n", " non_accum_cost.append(immediate_cost)\n", " raw_states.append(raw_state)\n", " custom_context = create_new_context(custom_context, hardcoded_action, reward, next_states)\n", " # print('len(custom_context) = ', len(custom_context))\n", " accum_cost_list.append(accum_cost)\n", " non_accum_cost_list.append(non_accum_cost)\n", " raw_state_list.append(raw_states)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sample02 = pd.concat(raw_state_list[0])\n", "lons02 = [init_lon]\n", "lats02 = [init_lat]\n", "\n", "for speed_lon, speed_lat in zip(sample02.velocity_u, sample02.velocity_v):\n", " lons02.append(lons02[-1] + speed_lon * avg_time_resolution * meters_in_degree)\n", " lats02.append(lats02[-1] + speed_lat * avg_time_resolution * meters_in_degree)\n", "\n", "sample02[\"dist_u\"] = lons02[1:]\n", "sample02[\"dist_v\"] = lats02[1:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot counter-factual (whatif) trajectories" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.scatter(\n", " wi_df[\"longitude\"][start_step - 500 : start_step],\n", " wi_df[\"latitude\"][start_step - 500 : start_step],\n", " s=15,\n", " color=\"green\",\n", " alpha=1,\n", " label=\"Past trajectory\",\n", ")\n", "\n", "plt.scatter(\n", " wi_df[\"longitude\"][start_step : start_step + 200],\n", " wi_df[\"latitude\"][start_step : start_step + 200],\n", " s=120,\n", " color=\"lightgray\",\n", " alpha=1,\n", " label=\"Future trajectory (reality)\",\n", ")\n", "\n", "plt.scatter(\n", " sample02[\"dist_u\"],\n", " sample02[\"dist_v\"],\n", " s=15,\n", " color=\"none\",\n", " edgecolor=\"red\",\n", " label=\"Counter-factual trajectory (Whatif)\",\n", ")\n", "\n", "plt.legend(fontsize=14)\n", "plt.title(\"Loon trajectory\", fontsize=16)\n", "plt.xlabel(\"Longitude\", fontsize=14)\n", "_ = plt.ylabel(\"Latitude\", fontsize=14)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As shown in the plot above, the counter-factual trajectory (red) predicted by Whatif eventually deviates from the expected trajectory had it followed the original action control. This is interesting because we did not change anything but the altitude control, yet its consequence shows up in the horizontal (longitude-latitude) plane. This suggests that the simulator **may have learned something consistent with the underlying physics** from the data. That is, winds at different altitudes may blow the ballon towards different directions horizontally, thus deviating from the trajectory at its original altitude." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot counter-factual states (temperature and altitude) and action (ACS)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axs = plt.subplots(3)\n", "axs[0].plot(wi_df[\"temperature\"][start_step : start_step + 100].values, label=\"Reality\")\n", "axs[0].plot(sample02.temperature.values, label=\"Counter-factual (Whatif)\")\n", "axs[0].set_ylabel(\"Temperature\", fontsize=14)\n", "axs[0].legend(fontsize=14)\n", "axs[0].grid(ls=\"--\")\n", "axs[1].plot(wi_df[\"altitude\"][start_step : start_step + 100].values, label=\"Reality\")\n", "_ = axs[1].plot(sample02.altitude.values, label=\"Counter-factual (Whatif)\")\n", "# axs[1].set_xlabel(\"Step\", fontsize=14)\n", "axs[1].set_ylabel(\"Altitude\", fontsize=14)\n", "axs[1].legend(fontsize=14)\n", "axs[1].grid(ls=\"--\")\n", "axs[2].plot(wi_df[\"acs\"][start_step : start_step + 100].values, label=\"Reality\")\n", "_ = axs[2].plot(sample02.acs.values, label=\"Whatif\")\n", "_ = axs[2].set_ylim([-1.1, 1])\n", "axs[2].set_xlabel(\"Step\", fontsize=14)\n", "axs[2].set_ylabel(\"ACS control\", fontsize=14)\n", "axs[2].legend(fontsize=14)\n", "axs[2].grid(ls=\"--\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above plot shows that when we set the control to \"*continuous descend*\" for 100 steps, the simulator reacted it by reducing the altitude of the ballon. This again shows that the simulator may have learned correct physics." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "interpreter": { "hash": "ff232c58d1a97beb926a6d37608f9435885b0ee7bcf3f3d7f45e4c9b09443efa" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "toc-autonumbering": true, "toc-showtags": true }, "nbformat": 4, "nbformat_minor": 4 }