.. _install:
Install
=========================
ML for Simulation Toolkit requires Python >= 3.9, < 3.13 and Pip. It is tested on Ubuntu 22.04 with Cuda 12.1. Running on MacOS and Windows is not tested but may work because there are no OS-specific codes or dependencies.
A GPU is recommended for following the tutorials but you can get started with the packaged sample data using CPU-only.
To get started on AWS, we recommend using the `AWS Deep Learning Base GPU AMI (Ubuntu 22.04) `_ with Cuda and NVIDIA pre-installed on Amazon Elastic Compute Cloud (AWS EC2) instance. The `G5 instance types `_ with one GPU such as ``g5.xlarge`` or ``g5.2xlarge`` are suitable to complete the MLSimKit tutorials. Follow `these steps to launch an AWS EC2 Linux instance `_. We recommend 500G+ storage for the tutorials.
MLSimKit supports multi-GPU training to accelerate performance. Once you are familiar with the toolchain and workflows, we recommend training on larger real-world sized datasets such as :ref:`DrivAerML ` using a ``g5.12xlarge`` or ``g5.48xlarge`` instance that have four and eight GPUs respectively. This will signficantly speed up training time by utilizing all available GPUs.
.. _install-from-source:
Install from Source
-------------------------------
Extract the source distributable (e.g, ``.tar.gz`` or ``.zip``) to a directory e.g, ``mlsimkit``.
First, check your Python version via: ``python3 --version`` and upgrade if possible (up to Python 3.12).
We use `pip `_ to install Python dependencies. Check if pip is already installed: ``python3 -m pip --version``. If not, we recommend installing pip via their `get-pip steps `_:
#. Download the script, from https://bootstrap.pypa.io/get-pip.py
#. Run ``python3 get-pip.py``
We recommend using a `virtual environment `_ to contain the dependencies. We use the standard ``venv`` module packaged with Python on MacOS but installed separately on Linux/Ubuntu::
sudo apt install python3-venv # Linux/Ubuntu only (not MacOS)
Next, create and activate the virtual environment within the ``mlsimkit`` directory:
.. code-block:: shell
cd mlsimkit
python3 -m pip install --upgrade pip # ensure latest pip
python3 -m venv .venv # create a .venv directory
source .venv/bin/activate # activate the virtual environment
Install ``mlsimkit`` and dependencies via pip (we recommend ``--edit`` if you want to edit source files):
.. code-block:: shell
pip3 install --edit .
You now have ``mlsimkit-learn`` installed (versions may be different):
.. code-block:: shell
% mlsimkit-learn --version
ML for Simulation Toolkit, version 0.1.0b1.dev35+gc33bc75.d20240508
See the :ref:`Troubleshooting ` guide if the install did not work.
Follow :ref:`install-next-steps` to start training.
Install from Source to a Remote Machine (via .whl)
--------------------------------------------------
If you want to install the ``mlsimkit`` python package outside of a virtual environment, you can use a pre-built wheel file (``.whl``). Below, we use the source install to
build the wheel file, copy the wheel file to the remote machine and then install via ``pip``.
.. note::
Currently the tutorials are not included in the ``.whl`` Python package. Please copy the tutorials separately. Hint: run ``make sdist`` from the ``mlsimkit`` directory to output a source ``.tar.gz`` file including the tutorials to the ``dist/`` folder.
Follow these steps to build the wheel file and install the ``mlsimkit`` python package:
1. Ensure you have the :ref:`source installation ` working.
2. From your ``mlsimkit`` source install directory, run the following command to build the wheel file from current source:
.. code-block:: shell
cd
make wheel
This command will create a ``.whl`` file in the ``dist/`` directory. The name of the ``.whl`` file created by this command is printed to the terminal.
3. Copy the newly created ``.whl`` file in ``dist/`` to the remote machine.
4. On the remote machine, install using ``pip``:
.. code-block:: shell
pip install mlsimkit-.whl --prefix=/opt/mlsimkit
export PATH=$PATH:/opt/mlsimkit/bin
export PYTHONPATH=$PYTHONPATH:/opt/mlsimkit/lib/python3.11/site-packages
Replace ``mlsimkit-.whl`` with the filename from step (2). e.g, ``mlsimkit-0.1.0b0-py3-none-any.whl``.
Replace ``/opt/mlsimkit`` with your desired installation directory, and update the ``PYTHONPATH`` with the appropriate Python version e.g., ``python3.11``.
After following these steps, the package will be installed on the remote machine, and you can use it without the need for a virtual environment.
.. _install-next-steps:
Next Steps
----------
After installing the ML for Simulation Toolkit, proceed to running training pipelines and make predictions:
1. **Quickstart:** Follow :doc:`KPI `, :doc:`Slice Prediction ` or :doc:`Surface Prediction ` quickstart guides and train a model and make predictions on sample data in 15 minutes. You will familiarize yourself with the CLI and configuration tools.
2. **Tutorials:** Reproduce results on a real dataset for one of the use cases by following the :ref:`tutorial-kpi-windsor` or :ref:`tutorial-slices-windsor` that use the :ref:`datasets-windsor`; or follow :ref:`tutorial-surface-ahmed` that uses the :ref:`datasets-ahmed`. See codes in ``tutorials/`` in the source code for walkthroughs on other datasets.
3. **Customize a use case**: Once you have reproduced results following the tutorials, explore in detail how to use your own datasets by diving into the :doc:`KPI prediction `, :doc:`Slice prediction `, and :doc:`Surface variable prediction ` users guides. You will be ready to experiment with your data and customize model codes for your own use cases.
4. **Running on AWS ParallelCluster (coming soom)**: Train at scale on AWS ParallelCluster.
5. **Running inside a SageMaker Notebook (coming soom)**: A guide to setting up MLSimKit with a your SageMaker Notebook for interactive development.
6. **Use the MLSimKit SDK in your Python code**: For example, you may want to integrate model codes from other libraries and utilize the MLSimKit CLI/Configuration framework. Please refer to :ref:`api-index` to learn more.