.. _quickstart-slices: Quickstart with Slice Prediction =================================== To get started, first make sure that: * MLSimKit is :ref:`installed ` We'll use a sample dataset to quickly show you an end-to-end workflow. Slice prediction is accessed via the ``mlsimkit-learn slices`` command .. code-block:: text [MLSimKit] Learning Tools Usage: mlsimkit-learn slices [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]... Use Case: Slice Prediction Slice Prediction is used to predict parameters from slices for 3D geometry meshes... Options: --help Show this message and exit. Commands: preprocess Step (1): Process input data and prepare manifests train-image-encoder Step (2): Train image encoding model process-mesh-data Step (3): Link the geometry and image training data train-prediction Step (4): Train prediction using encoder outputs predict Step (5): Predict results and evaluate performance inspect-image-encoder (Debug): Evaluate image encoder performance There are five steps to run the pipeline through preprocessing, training, and then prediction. You may run commands separately. For convenience, you can run one or more commands including the entire pipeline using YAML config files. You may provide a config by running: .. code-block:: shell mlsimkit-learn --config slices

... You can use the ``help`` command to see options and the definitions of hyperparameters. For example, the following command will display all hyperparameters for a training step: .. code-block:: shell mlsimkit-learn slices train-image-encoder --help Sample Dataset ------------------------ There is a sample config and a very small sample dataset called "ahmed-sample" so you can run end-to-end quickly: .. code-block:: shell src/mlsimkit/conf └── slices └── sample.yaml src/mlsimkit/datasets ├── ... └── ahmed-sample ├── downsampled_stls ├── slice_images └── slices.manifest External Datasets ------------------------ In addition to the sample dataset, there are tutorials to get started with publicly available datasets:: tutorials/ └── slices ├── ahmed/ ├── drivaer/ ├── sample/ └── windsor/ Run the Sample ------------------------ First, make a folder for all the outputs. Replace ``--output-dir quickstart/slices`` in the command below with your own folder location. Second, run the entire train-predict pipeline to make predictions on the sample data: .. code-block:: shell mlsimkit-learn --output-dir quickstart/slices \ --config src/mlsimkit/conf/slices/sample.yaml \ slices preprocess \ train-image-encoder --device cpu \ inspect-image-encoder \ process-mesh-data \ train-prediction --device cpu \ predict Also, note that commands can be chained together. For example, the above runs `preprocess`, `train-image-encoder`, `inspect-image-encoder`, `process-mesh-data`, `train-prediction`, and then `predict`. Running on GPU ~~~~~~~~~~~~~~ MLSimKit automatically uses a GPU by default. To use your GPU, remove ``--device cpu`` from the previous command and run again: .. code-block:: shell mlsimkit-learn --output-dir quickstart/slices \ --config src/mlsimkit/conf/slices/sample.yaml \ slices preprocess train-image-encoder inspect-image-encoder \ process-mesh-data train-prediction predict .. note:: On older MacOS hardware, you may see the error ``Cannot convert a MPS Tensor to float64 dtype``. If so, force CPU by specifying ``--device cpu`` for the train commands. In general, please see the :ref:`Troubleshooting ` guide for possible errors if commands do not work. All artifacts are written into the output directory ``--output-dir``. You may also set the output directory in the config file. Commands automatically share paths to the output artifacts such as the train model path. The sample configuration below sets some input options but most options use defaults. There are many options, which we go into detail after the quickstart. The sample configuration ``conf/slices/sample.yaml`` looks like this: .. code-block:: yaml slices: preprocess: # path is relative to mlsimkit/datasets, which is the default search path manifest-uri: ahmed-sample/slices.manifest # split the dataset into three train-size: 0.6 valid-size: 0.2 test-size: 0.2 train-image-encoder: batch-size: 1 # small for quickstart sample, use larger normally. See user guide. train-prediction: epochs: 10 # low number for sample quickstart batch-size: 1 # small for quickstart sample, use larger normally. See user guide. .. note:: A "manifest" describes the paths to a dataset and is used to share data between tasks for a particular prediction use case like Slices. For now, know that ``ahmed-sample/slices.manifest`` references a small dataset packaged with MLSimKit. You will see console logging something like this: .. code-block:: shell [INFO] [MLSimKit] Learning Tools [INFO] Package Version: 0.2.3.dev3+gaf49957.d20240808 [INFO] Running command 'preprocess' [INFO] Preprocessing manifest '/home/ubuntu/mlsimkit/src/mlsimkit/datasets/ahmed-sample/slices.manifest' [INFO] Image data written to '/home/ubuntu/mlsimkit/quickstart/slices/slices/slice-group-0.npy' [INFO] Image data written to '/home/ubuntu/mlsimkit/quickstart/slices/slices/slice-group-1.npy' [INFO] Image data written to '/home/ubuntu/mlsimkit/quickstart/slices/slices/slice-group-2.npy' [INFO] Image data written to '/home/ubuntu/mlsimkit/quickstart/slices/slices/slice-group-3.npy' [INFO] Image data written to '/home/ubuntu/mlsimkit/quickstart/slices/slices/slice-group-4.npy' [INFO] Image data written to '/home/ubuntu/mlsimkit/quickstart/slices/slices/slice-group-5.npy' [INFO] Image data written to '/home/ubuntu/mlsimkit/quickstart/slices/slices/slice-group-6.npy' [INFO] Manifest '/home/ubuntu/mlsimkit/quickstart/slices/slices-copy.manifest' written (7 records) [INFO] Splitting manifest into train-size=0.6 valid-size=0.2 test-size=0.2 [INFO] Manifest '/home/ubuntu/mlsimkit/quickstart/slices/train.manifest' written (4 records) [INFO] Manifest '/home/ubuntu/mlsimkit/quickstart/slices/validate.manifest' written (1 records) [INFO] Manifest '/home/ubuntu/mlsimkit/quickstart/slices/test.manifest' written (2 records) [INFO] Running command 'train-image-encoder' [INFO] Training state configuration: {"Distributed": "no", "Num processes": 1, "Process index": 0, "Local process index": 0, "Device": "cuda", "Mixed precision": "no"} [INFO] Training started for 'model' [INFO] Train dataset size: 4 [INFO] Validation dataset size: 1 [INFO] Training: 0%| | 0/5 [00:00