=============================== Welcome to Amazon Accessible RL =============================== Amazon Accessible RL (A2RL) provides everything a data scientist needs to develop a solution to a `sequential decision making problem `_ working on time series data. You can install the stable version A2RL with ``pip``, preferrably to a virtual environment. .. code-block:: bash pip install a2rl We aim to bring you a low-code package to apply `offline RL `_, starting from problem formulation, :doc:`initial data analysis ` to see if a solution is possible, train a :doc:`simulator ` (aka *digital twin*) based on the data, and providing :doc:`recommended actions `. At the core of A2RL is a state-of-the-art generative transformer -- the same technology behind `GATO `_, `trajectory transformer `_ and `decision transformer `_. You should start by formulating your problem into *states*, *actions*, and *rewards* (see real-world examples below), then :doc:`prepare a dataset ` that reflects the formulation. .. collapse:: Manufacturing: Building HVAC system .. list-table:: :widths: 50 50 :header-rows: 1 * - Description - Formulation * - Consider a manufacturing facility with an `HVAC `_ system that manages temperature of various equipment, with multiple water `chillers `_ and roof-top `cooling towers `_. The building management wants to maximize the chillers efficiency, `COP `_ (Coefficient of Performance, in KW/RT), by dynamically `staging the chillers `_ (i.e., deciding which chillers to run) to meet dynamic heating load on various weather condition. - To frame the problem into "given *states*, decide *actions* that optimize *rewards*", the operator would just need to pull these information from their historical data which their cooling system collected at an hourly interval. .. code-block:: yaml states: - ECWT (Entering Condenser Water Temperature) - evaporator thermal power output (KW or RT) actions: - identifiers of running chillers rewards: - power consumption (KW) to minimize .. collapse:: Manufacturing: Industrial-scale hot stamping `Hot stamping `_ equipment needs to maintain constant temperature on dies. Blanks are pre-heated and then stamped and cooled between the die. The change in temperature is critical for the part to be annealed to the right hardness. This brings opportunities to increase the efficiency of production line, in particular to prevent over-heating in the hotstamping line using just minimum amount of energy. While the exact historical data varies across the systems, in-principle they consist of historical staging activities, thermal load, flow rate, temperature, and power consumption. .. list-table:: :widths: 50 50 :header-rows: 1 * - Description - Formulation * - Water `chiller `_ circuit `rejects heat `_ generated in the die to the atmosphere. - Stage chillers (*actions*) based on heat load and ambient temperature (*states*) to maximize efficiency (rewards). Since the efficency (`COP `_) is inversely proposional to power consumption (KW) which is directly measureable, we can recast the objective as minimizing power consumption (*rewards*). * - `Furnaces `_ heat metal blanks to the right temperature for stamping. - An optimization problem statement could be to stage furnaces (*actions*) depending on utilization and thermal mass of metal blanks (*states*) to minimize the energy consumption (*rewards*). * - `Compressor `_ supplies `compressed air `_ for many industrial processes. - An optimization problem statement could be to staging compressors (*actions*) based on air demand (*states*) to minimize energy consumption (*rewards*). .. collapse:: Utility: Smart building HVAC economizer .. list-table:: :widths: 50 50 :header-rows: 1 * - Description - Formulation * - HVAC `economizer `_ optimization is a common use case in smart building management. Building operator uses `rooftop unit (RTU) `_ economizer setpoint to control the amount of outside air intake based on weather condition, to maintain an acceptable level of comfort in a zone while minimizing power consumption. The target temperature and humidity of the zone depends on the actual activity. Power is consumed by mechanical cooling to cool the mixture of outside air and returned air to the zone's target temperature and humidity. Power consumption depends on occupancy rate, outdoor temperature and humidity, returned air temperature and humidity, target zone's temperature and humidity, date and time of day, and the activities in the zone. - To frame this into sequential decision problem, we can define the *states*, *actions*, *rewards* as follows: .. code-block:: yaml states: - occupancy rate - outdoor temperature - outdoor humidity - returned air temperature - returned air humidity actions: - RTU economiser setpoints rewards: - power consumption .. collapse:: Utility: Energy arbitrage .. list-table:: :widths: 50 50 :header-rows: 1 * - Description - Formulation * - Energy storage system (ESS) can benefit the grid in many ways such as to balance and maintain the grid, or to store electricity for later use during peak demand, outage or emergency period. Energy storage has also created new opportunity for energy storage owner to generate profit via arbitrage, the difference between revenue received from energy sale (discharge) and the charging cost. Profit generation from arbitrage depends on energy price uncertainty in real time market, and also utility energy battery storage level, cost of energy generation. Utility operator need to decide whether to sell, buy or hold on to available energy at the right time in order to maximize profit. - To frame this into sequential decision problem, we can set the *states*, *actions*, *rewards* as follows: .. code-block:: yaml states: - electric price - battery storage level - battery capacity - charging/discharging efficiency - wear and tear cost - energy demand actions: - buy, sell or hold rewards: - profit (= energy_sales - charging cost) See also a `solution that predates Amazon Whatif `_ for this exact problem. .. collapse:: Utility: District heating .. list-table:: :widths: 50 50 :header-rows: 1 * - Description - Formulation * - In a district heating network, the heat generated by producing plant is distributed to consumer via heated supply water for floor heating purpose. After the heat has been transferred to floor heating, the cold return water is circulated back to the district heating plant. The water circulates in a closed pipeline. In this use case, we are looking at controlling room temperature using underfloor heating by adjusting the amount of heated water flowing into the underfloor pipe. The amount of heated water depends on external condition such as outside air temperature and humidity, date and time of day. - To frame this into sequential decision problem, we can set the *states*, *actions*, *rewards* as follows: .. code-block:: yaml states: - outside air temperature and humidity - room temperature - supply water temperature - occupancy rate actions: - return water temperature setpoint rewards: - temperature differences where ``temperature differences`` is the discrepancy between actual and target room temperature. | Once the dataset is ready, the end-to-end workflow is as concise as follows: .. code-block:: python import a2rl as wi from a2rl.utils import plot_information # Load a sample dataset which contains historical states, actions, and rewards. wi_df = wi.read_csv_dataset(wi.sample_dataset_path("chiller")).trim().add_value() wi_df = wi_df.iloc[:1000] # Reduce data size for demo purpose # Checks and analysis plot_information(wi_df) # Train a simulator tokenizer = wi.AutoTokenizer(wi_df, block_size_row=2) builder = wi.GPTBuilder(tokenizer, model_dir="my-model", ) model = builder.fit() simulator = wi.Simulator(tokenizer, model, max_steps=100, reset_coldstart=2) # Get recommended actions given an input context (s,a,r,v,...s). # Context must end with states, and its members must be tokenized. custom_context = simulator.tokenizer.df_tokenized.sequence[:7] recommendation_df = simulator.sample(custom_context, 3) # Show recommendations (i.e., trajectory) recommendation_df Contents ======== .. toctree:: :hidden: Overview .. toctree:: :maxdepth: 1 start_here example changelog reference developer Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`