Welcome to Amazon Accessible RL#

Amazon Accessible RL (A2RL) provides everything a data scientist needs to develop a solution to a sequential decision making problem working on time series data.

You can install the stable version A2RL with pip, preferrably to a virtual environment.

pip install a2rl

We aim to bring you a low-code package to apply offline RL, starting from problem formulation, initial data analysis to see if a solution is possible, train a simulator (aka digital twin) based on the data, and providing recommended actions. At the core of A2RL is a state-of-the-art generative transformer – the same technology behind GATO, trajectory transformer and decision transformer.

You should start by formulating your problem into states, actions, and rewards (see real-world examples below), then prepare a dataset that reflects the formulation.

Manufacturing: Building HVAC system

Description

Formulation

Consider a manufacturing facility with an HVAC system that manages temperature of various equipment, with multiple water chillers and roof-top cooling towers.

The building management wants to maximize the chillers efficiency, COP (Coefficient of Performance, in KW/RT), by dynamically staging the chillers (i.e., deciding which chillers to run) to meet dynamic heating load on various weather condition.

To frame the problem into “given states, decide actions that optimize rewards”, the operator would just need to pull these information from their historical data which their cooling system collected at an hourly interval.

states:
- ECWT (Entering Condenser Water Temperature)
- evaporator thermal power output (KW or RT)

actions:
- identifiers of running chillers

rewards:
- power consumption (KW) to minimize
Manufacturing: Industrial-scale hot stamping

Hot stamping equipment needs to maintain constant temperature on dies. Blanks are pre-heated and then stamped and cooled between the die. The change in temperature is critical for the part to be annealed to the right hardness. This brings opportunities to increase the efficiency of production line, in particular to prevent over-heating in the hotstamping line using just minimum amount of energy.

While the exact historical data varies across the systems, in-principle they consist of historical staging activities, thermal load, flow rate, temperature, and power consumption.

Description

Formulation

Water chiller circuit rejects heat generated in the die to the atmosphere.

Stage chillers (actions) based on heat load and ambient temperature (states) to maximize efficiency (rewards). Since the efficency (COP) is inversely proposional to power consumption (KW) which is directly measureable, we can recast the objective as minimizing power consumption (rewards).

Furnaces heat metal blanks to the right temperature for stamping.

An optimization problem statement could be to stage furnaces (actions) depending on utilization and thermal mass of metal blanks (states) to minimize the energy consumption (rewards).

Compressor supplies compressed air for many industrial processes.

An optimization problem statement could be to staging compressors (actions) based on air demand (states) to minimize energy consumption (rewards).

Utility: Smart building HVAC economizer

Description

Formulation

HVAC economizer optimization is a common use case in smart building management. Building operator uses rooftop unit (RTU) economizer setpoint to control the amount of outside air intake based on weather condition, to maintain an acceptable level of comfort in a zone while minimizing power consumption. The target temperature and humidity of the zone depends on the actual activity.

Power is consumed by mechanical cooling to cool the mixture of outside air and returned air to the zone’s target temperature and humidity. Power consumption depends on occupancy rate, outdoor temperature and humidity, returned air temperature and humidity, target zone’s temperature and humidity, date and time of day, and the activities in the zone.

To frame this into sequential decision problem, we can define the states, actions, rewards as follows:

states:
  - occupancy rate
  - outdoor temperature
  - outdoor humidity
  - returned air temperature
  - returned air humidity

actions:
  - RTU economiser setpoints

rewards:
  - power consumption
Utility: Energy arbitrage

Description

Formulation

Energy storage system (ESS) can benefit the grid in many ways such as to balance and maintain the grid, or to store electricity for later use during peak demand, outage or emergency period. Energy storage has also created new opportunity for energy storage owner to generate profit via arbitrage, the difference between revenue received from energy sale (discharge) and the charging cost.

Profit generation from arbitrage depends on energy price uncertainty in real time market, and also utility energy battery storage level, cost of energy generation. Utility operator need to decide whether to sell, buy or hold on to available energy at the right time in order to maximize profit.

To frame this into sequential decision problem, we can set the states, actions, rewards as follows:

states:
  - electric price
  - battery storage level
  - battery capacity
  - charging/discharging efficiency
  - wear and tear cost
  - energy demand

actions:
  - buy, sell or hold

rewards:
  - profit (= energy_sales - charging cost)

See also a solution that predates Amazon Whatif for this exact problem.

Utility: District heating

Description

Formulation

In a district heating network, the heat generated by producing plant is distributed to consumer via heated supply water for floor heating purpose. After the heat has been transferred to floor heating, the cold return water is circulated back to the district heating plant. The water circulates in a closed pipeline. In this use case, we are looking at controlling room temperature using underfloor heating by adjusting the amount of heated water flowing into the underfloor pipe. The amount of heated water depends on external condition such as outside air temperature and humidity, date and time of day.

To frame this into sequential decision problem, we can set the states, actions, rewards as follows:

states:
  - outside air temperature and humidity
  - room temperature
  - supply water temperature
  - occupancy rate

actions:
  - return water temperature setpoint

rewards:
  - temperature differences

where temperature differences is the discrepancy between actual and target room temperature.

Once the dataset is ready, the end-to-end workflow is as concise as follows:
import a2rl as wi
from a2rl.utils import plot_information

# Load a sample dataset which contains historical states, actions, and rewards.
wi_df = wi.read_csv_dataset(wi.sample_dataset_path("chiller")).trim().add_value()
wi_df = wi_df.iloc[:1000]  # Reduce data size for demo purpose

# Checks and analysis
plot_information(wi_df)

# Train a simulator
tokenizer = wi.AutoTokenizer(wi_df, block_size_row=2)
builder = wi.GPTBuilder(tokenizer, model_dir="my-model", )
model = builder.fit()
simulator = wi.Simulator(tokenizer, model, max_steps=100, reset_coldstart=2)

# Get recommended actions given an input context (s,a,r,v,...s).
# Context must end with states, and its members must be tokenized.
custom_context = simulator.tokenizer.df_tokenized.sequence[:7]
recommendation_df = simulator.sample(custom_context, 3)

# Show recommendations (i.e., trajectory)
recommendation_df

Contents#

Indices and tables#