`mlsimkit.learn.common` – MLSimKit Learn Common¶

This part of the documentation covers the common interfaces shared across use cases like KPI and Surface prediction models.

Training¶

The Training module contains the main training loop function train() and related helpers.

class mlsimkit.learn.common.training.ModelIO(*args, **kwargs)¶

Protocol for defining the interface for creating, loading, and saving models.

This protocol defines the methods that must be implemented by any class that conforms to the ModelIO interface. The ModelIO interface is used to abstract the model creation, loading, and saving processes, allowing different model architectures to be used without modifying the core training logic.

None¶

new()¶: Create a new instance of the model.

load(config)¶: Load a saved model checkpoint.

save(model, model_path, train_loss, validation_loss, optimizer, epoch)¶: Save the model checkpoint.

__init__(*args, **kwargs)¶

_abc_impl = <_abc._abc_data object>¶

_is_protocol = True¶

load(config)¶

Load a saved model checkpoint.

Parameters:

config (mlsimkit.learn.common.schema.training.BaseTrainSettings) – The training configuration settings.

Returns:

model: The loaded model.
optimizer: The loaded optimizer.
start_epoch: The starting epoch for training.
train_losses: A list of training losses.
validation_losses: A list of validation losses.
best_validation_loss: The best validation loss value.
best_validation_loss_epoch: The epoch with the best validation loss.
best_model: The model with the best validation loss.
losses_df: A DataFrame containing the training and validation losses.

Return type:

Tuple[torch.nn.Module, torch.optim.Optimizer, int, list, list, float, int, torch.nn.Module, pd.DataFrame]

new()¶

Create a new instance of the model.

Returns:: The new model instance.
Return type:: torch.nn.Module

save(model, model_path, train_loss, validation_loss, optimizer, epoch)¶

Save the model checkpoint.

Parameters:

model (torch.nn.Module) – The model to save.
model_path (str or Path) – The path to save the model checkpoint.
train_loss (torch.Tensor) – The training loss.
validation_loss (torch.Tensor) – The validation loss.
optimizer (torch.optim.Optimizer) – The optimizer used during training.
epoch (int) – The current epoch number.

mlsimkit.learn.common.training.fmt_state(accelerator)¶

Format the state of the Accelerator instance.

Parameters:: accelerator (Accelerator) – The Accelerator instance.
Returns:: A dictionary containing the formatted state information.
Return type:: dict

mlsimkit.learn.common.training.initialize(config: BaseTrainSettings, accelerator)¶

Initialize the training environment based on the provided configuration and accelerator.

Parameters:

config (BaseTrainSettings) – The training configuration settings.
accelerator (Accelerator) – The Accelerator instance.

Returns:

The device to be used for training.

Return type:

torch.device

mlsimkit.learn.common.training.is_distributed()¶

Check if the current execution environment is distributed.

Returns:: True if the execution environment is distributed, False otherwise.
Return type:: bool

mlsimkit.learn.common.training.load_checkpoint_model(modelio, config)¶

Load a checkpoint model from the provided configuration.

Parameters:

modelio (ModelIO) – The ModelIO instance for creating and loading models.
config (BaseTrainSettings) – The training configuration settings.

Raises:

Exception – If any of the required checkpoint paths are None.

Returns:

model: The loaded model.
optimizer: The loaded optimizer.
start_epoch: The starting epoch for training.
train_losses: A list of training losses.
validation_losses: A list of validation losses.
best_validation_loss: The best validation loss value.
best_validation_loss_epoch: The epoch with the best validation loss.
best_model: The model with the best validation loss.
losses_df: A DataFrame containing the training and validation losses.

Return type:

Tuple[torch.nn.Module, torch.optim.Optimizer, int, list, list, float, int, torch.nn.Module, pd.DataFrame]

mlsimkit.learn.common.training.make_accelerator(config: BaseTrainSettings)¶

Create an instance of the HuggingFace Accelerator for efficient training on various hardware configurations.

The HuggingFace Accelerator is a utility that simplifies the process of training deep learning models on different hardware configurations, including CPUs, GPUs, and multi-GPU setups. It handles device management, distributed training, mixed precision, and other performance optimizations automatically.

By using the Accelerator, MLSimKit can leverage efficient training on a wide range of hardware setups without the need for extensive manual configuration and optimization.

Parameters:: config (BaseTrainSettings) – The training configuration settings.
Returns:: The HuggingFace Accelerator instance.
Return type:: Accelerator

mlsimkit.learn.common.training.train(modelio: ModelIO, train_loader: DataLoader, validation_loader: DataLoader, calc_loss: Callable[[Tensor, Data], Tensor], device: device, config: BaseTrainSettings, model_name: str, data_scaler, accelerator: Accelerator) → Tuple[List[float], List[float], Module, Module, float, DataLoader]¶

Train a model using the provided data loaders, loss function, and configuration.

This function encapsulates the core training loop for various machine learning models in MLSimKit. It is designed to be generic and reusable across different use cases, such as KPI prediction, surface variable prediction, and slice prediction.

The train function operates in conjunction with the ModelIO interface, which abstracts the creation, loading, and saving of models. The ModelIO interface allows different model architectures to be used for training without modifying the core training logic.

The training data and validation data are provided as PyTorch data loaders, which abstract the data loading and preprocessing steps. This design allows for different data types and preprocessing pipelines to be used for training, as long as they conform to the data loader interface.

The training process involves the following steps:

Prepare the training and validation data loaders.
Initialize or load a model using the ModelIO interface.
Perform the training loop, iterating over epochs and updating the model weights.
Validate the model on the validation data loader after each epoch.
Save the best model and checkpoint models using the ModelIO interface.
Return the training and validation losses, the best model, and other relevant information.

By leveraging the ModelIO interface, the train function can be used with different model architectures without modifying its core implementation. The specific model architecture is provided through the modelio argument, which must conform to the ModelIO protocol.

Parameters:

modelio (ModelIO) – The ModelIO instance for creating and loading models.
train_loader (torch.utils.data.DataLoader) – The data loader for training data.
validation_loader (torch.utils.data.DataLoader) – The data loader for validation data.
calc_loss (Callable[[torch.Tensor, torch_geometric.data.Data], torch.Tensor]) – The function to calculate the loss.
device (torch.device) – The device to use for training.
config (mlsimkit.learn.common.schema.training.BaseTrainSettings) – The training configuration settings.
model_name (str) – The name of the model.
data_scaler (DataScaler) – The data scaler for normalizing data.
accelerator (accelerate.Accelerator) – The HuggingFace Accelerator instance.

Returns:

validation_losses (list): A list of validation losses for each epoch.
train_losses (list): A list of training losses for each epoch.
best_model (torch.nn.Module): The model with the best validation loss.
model (torch.nn.Module): The final trained model.
best_validation_loss (float): The best validation loss achieved during training.
validation_loader (torch.utils.data.DataLoader): The data loader for validation data.

Return type:

Tuple[list, list, torch.nn.Module, torch.nn.Module, float, torch.utils.data.DataLoader]

mlsimkit.learn.common.training.validate(loader, device, model, data_scaler, calc_loss)¶

Calculate the validation loss of the model on the given data loader.

Parameters:

loader (torch.utils.data.DataLoader) – The data loader for validation data.
device (torch.device) – The device to use for validation.
model (torch.nn.Module) – The model to validate.
data_scaler (DataScaler) – The data scaler for normalizing data.
calc_loss (callable) – The function to calculate the loss.

Returns:

The total validation loss.

Return type:

float

mlsimkit.learn.common.training.validate_training_settings(config: BaseTrainSettings, ctx: Context)¶

Validate the training settings configuration.

Parameters:

config (BaseTrainSettings) – The training configuration settings.
ctx (click.Context) – The Click context object.

Raises:

click.UsageError – If the configuration is invalid (e.g., using CPU with accelerate launch).

Tracking¶

The Tracking module wraps MLFlow to make it easier to track and report experiments.

mlsimkit.learn.common.tracking.configure(ctx)¶

Configure MLFlow tracking for the given context.

This function configures MLFlow tracking for the main process and worker processes. For the main process, it sets up the MLFlow experiment and run, and writes the run ID to the project file. For worker processes, it waits for the main process to write the run ID, and then configures MLFlow with the same run ID.

Warning: This assumes accelerate.Accelerator() has been initialized. If not, accelerate: settings will default since we get the PartialState() instance. For now, this works because we are careful in the train CLI functions, but it’s brittle.

Parameters:: ctx (click.Context) – The click context object.

mlsimkit.learn.common.tracking.configure_mlflow(settings: MLFlowConfig)¶

Configure MLFlow with the given settings.

This function sets up the MLFlow experiment and run based on the provided settings. If an experiment with the given name does not exist, it creates a new experiment. If a run is not already active, it starts a new run with the specified run ID (if provided).

Parameters:: settings (MLFlowConfig) – The MLFlow configuration settings.
Returns:: The updated MLFlow configuration settings.
Return type:: MLFlowConfig

mlsimkit.learn.common.tracking.context(ctx, artifact_root, metric_root)¶

Context manager for configuring MLFlow tracking.

This context manager sets the global ARTIFACT_ROOT and METRIC_ROOT variables, configures MLFlow tracking, and restores the variables to their original values when the context is exited.

Example usage:

# MLFlow is configured with the tracking context
with tracking.context(ctx, artifact_root='kpi/train', metric_root='kpi.train'):
    run_train(...)

Parameters:

ctx (click.Context) – The click context object.
artifact_root (str) – The root path for artifact logging.
metric_root (str) – The root path for metric logging.

Yields:

None

mlsimkit.learn.common.tracking.log_artifact_wrapper(func)¶

Wrap the mlflow.log_artifact or mlflow.log_artifacts function.

This decorator wraps the provided function to conditionally log artifacts based on the should_log() function and generates a default artifact path.

For instance, when artifact path is NOT specified, the parents of the local path root are retained relative to the project output root. By keeping the parents, the artifact path is unique just like local output files. This depends on the Project output directory (OUTPUT_ROOT) and the tracking context’s ARTIFACT_ROOT.

For example:

OUTPUT_ROOT = "outputs/training"
ARTIFACT_ROOT = "kpi/train"

# Log a single file with a custom artifact path
log_artifact("/path/to/local/file.txt", "custom/artifact/path/file.txt")

# Log a directory with an automatically generated artifact path
log_artifact("outputs/training/path/to/local/dir")  # Artifact path will be "kpi/train/path/to/local/dir"

# Log a file with an automatically generated artifact path
log_artifact("outputs/training/path/to/local/file.txt")  # Artifact path will be "kpi/train/path/to/local"

Parameters:: func (callable) – The function to wrap (mlflow.log_artifact or mlflow.log_artifacts).
Returns:: The wrapped function.
Return type:: callable

mlsimkit.learn.common.tracking.log_metric_wrapper(func)¶

Wrap the mlflow.log_metric and mlflow.log_param functions.

This decorator wraps the mlflow.log_metric/mlflow.log_param function to conditionally log metrics/params based on the should_log() function and the global METRIC_ROOT setting.

Parameters:: func (callable) – The mlflow.log_metric or mlflow.log_param functions.
Returns:: The wrapped function.
Return type:: callable

mlsimkit.learn.common.tracking.log_metrics_wrapper(func)¶

Wrap the mlflow.log_metrics and mlflow.log_params functions.

This decorator wraps the mlflow.log_metric function to conditionally log metrics based on the should_log() function and the global METRIC_ROOT setting.

Parameters:: func (callable) – The mlflow.log_metrics or mlflow.log_params functions.
Returns:: The wrapped function.
Return type:: callable

mlsimkit.learn.common.tracking.should_log()¶

Determine if logging should be performed.

Logging should be performed in the following cases: 1. If MLFlow has been configured on start 2. If the Accelerate library is not initialized (i.e., not during training). 3. If the Accelerate library is initialized and it’s the main process (to avoid duplicate logging during training).

The Accelerate library is used for distributed training, and during training, only the main process should log to MLflow to avoid duplicate logging from worker processes.

Returns:: True if logging should be performed, False otherwise.
Return type:: bool

Mesh Utilities¶

The Mesh module contains helpers for third-parties to process meshes such as down-sampling and format conversions.

mlsimkit.learn.common.mesh.as_torch_data(mesh, save_cell_data: bool = False, normalize_node_positions: bool = False, label: int | None = None, global_condition: int | None = None) → Data¶

Convert mesh to PyTorch Geometric Data format.

Parameters:

path_list (List[str]) – List of paths to mesh files.
label (int, optional) – Label for the mesh data. Defaults to None.

Returns:

PyTorch Geometric Data object containing the mesh data.

Return type:

torch_geometric.data.Data

Example

>>> mesh_paths = ['mesh1.stl', 'mesh2.stl']
>>> data = as_torch_data(mesh_paths, label=1)
>>> print(data)
Data(x=[2048, 7], edge_attr=[4096, 4], edge_index=[2, 4096], y=[1])

mlsimkit.learn.common.mesh.construct_processed_data(mesh_paths: List[str], labels: List[int] | None = None, global_conditions: List[int] | None = None, output_dir: str = '.', downsample_perc: float | None = None, num_processes: int = 1) → Generator[Data, None, None]¶

Construct a generator that yields processed mesh data in PyTorch Geometric Data format.

Parameters:

mesh_paths (List[str]) – List of paths to mesh files.
labels (List[int], optional) – List of labels for the mesh data. If not provided, labels will be set to None.
global_conditions (List[int], optional) – List of global conditions for the mesh data. If not provided, it will be set to None.
output_dir (str, optional) – Directory to save the converted and downsampled mesh files. Defaults to the current directory.
downsample_perc (float, optional) – Percentage of vertices to keep after downsampling (between 0 and 100). If not provided, no downsampling will be performed.
num_processes (int, optional) – Number of processes to use for parallel processing. Defaults to 1 (single process).

Yields:

torch_geometric.data.Data – PyTorch Geometric Data object containing the mesh data.

Example

>>> mesh_paths = ['mesh1.stl', 'mesh2.stl', 'mesh3.stl']
>>> labels = [0, 1, 0]
>>> data_generator = construct_processed_data(mesh_paths, labels, output_dir='output', downsample_perc=50.0, num_processes=4)
>>> for data in data_generator:
...     print(data)
Data(x=[1024, 7], edge_attr=[2048, 4], edge_index=[2, 2048], y=[0], mesh_path=['output/50perc_ds/mesh1_50perc_ds.stl'], downsample_perc=50.0)
Data(x=[2048, 7], edge_attr=[4096, 4], edge_index=[2, 4096], y=[1], mesh_path=['output/50perc_ds/mesh2_50perc_ds.stl'], downsample_perc=50.0)
Data(x=[1536, 7], edge_attr=[3072, 4], edge_index=[2, 3072], y=[0], mesh_path=['output/50perc_ds/mesh3_50perc_ds.stl'], downsample_perc=50.0)

mlsimkit.learn.common.mesh.convert(filepath: str, output_dir: str) → Path¶

Convert a mesh file to an STL file if the file extension is not supported.

Parameters:

filepath (str) – Path to the input mesh file.
output_dir (str) – Directory to save the converted STL file.

Raises:

RuntimeError – If the file extension is not supported and there is no converter available.

Returns:

Path to the converted STL file, or the original file if it is already an STL.

Return type:

Path

Example

>>> convert('mesh.stl', 'output')
Path('mesh.stl')
>>> convert('mesh.vtp', 'output')
Path('output/stl/mesh.stl')

mlsimkit.learn.common.mesh.convert_vtp_to_stl(vtp_file_path: Path, stl_file_path: Path) → Path¶

Convert a VTP file to an STL file.

Parameters:

vtp_file_path (Path) – Path to the input VTP file.
stl_file_path (Path) – Path to save the output STL file.

Returns:

Path to the output STL file.

Return type:

Path

Example

>>> convert_vtp_to_stl(Path('mesh.vtp'), Path('mesh.stl'))
Path('mesh.stl')

mlsimkit.learn.common.mesh.downsample(original_mesh_filepath: str, output_dir: str, downsample_perc: float | None) → Path¶

Wrapper to downsample a mesh file to the specified percentage and save the downsampled mesh to the output directory. If downsample_perc is None, the original mesh file is returned.

Parameters:

original_mesh_filepath (str) – Path to the input mesh file.
output_dir (str) – Directory to save the downsampled mesh file.
downsample_perc (float, optional) – Percentage of vertices to keep after downsampling (between 0 and 100).

Raises:

RuntimeError – If downsample_perc is not between 0 and 100.

Returns:

Path to the downsampled mesh file, or the original mesh file if downsample_perc is None.

Return type:

Path

Example

>>> downsample('mesh.stl', 'output', 50.0)
Path('output/50perc_ds/mesh_50perc_ds.stl')
>>> downsample('mesh.stl', 'output', None)
Path('mesh.stl')

mlsimkit.learn.common.mesh.downsample_mesh_file(mesh_filepath: str, output_path: str, downsample_perc: float) → Path¶

Downsample a mesh file to the specified percentage and save the downsampled mesh to the output path.

Parameters:

mesh_filepath (str) – Path to the input mesh file.
output_path (str) – Path to save the downsampled mesh file.
downsample_perc (float) – Percentage of vertices to keep after downsampling (between 0 and 100).

Returns:

Path to the downsampled mesh file.

Return type:

Path

Example

>>> downsample_mesh_file('mesh.stl', 'downsampled_mesh.stl', 50.0)
Path('downsampled_mesh.stl')

mlsimkit.learn.common.mesh.get_edges(mesh, cells: Tensor) → Tensor¶

Get two-way mesh edges via mesh cell data.

Parameters:: cells (torch.Tensor) – Tensor of mesh cell data (faces).
Returns:: Tensor of two-way mesh edges.
Return type:: torch.Tensor

Example

>>> cells = torch.tensor([[0, 1, 2], [2, 3, 4]])
>>> get_edges(cells)
tensor([[0, 1],
        [1, 2],
        [2, 0],
        [2, 3],
        [3, 4],
        [4, 2]])

mlsimkit.learn.common.mesh.load_mesh(path: str) → Trimesh¶

Load a mesh from file. Always flattens if the file is a scene.

Parameters:: path (str) – Path to the input mesh file.
Returns:: Loaded mesh.
Return type:: trimesh.Trimesh

Example

>>> mesh = load_mesh('mesh.stl')

mlsimkit.learn.common.mesh.load_mesh_files(*paths: str)¶

mlsimkit.learn.common.mesh.load_mesh_files_pyvista(*paths: str)¶

mlsimkit.learn.common.mesh.normalize_points(original_node_positions)¶

Normalize a numpy array of 3D points so that they fit within the cube [-1, 1] in all dimensions, while maintaining the shape and aspect ratio of the original distribution.

Parameters:: original_node_positions (np.ndarray) – A 2D numpy array where each row represents a 3D point (x, y, z).
Returns:: A 2D numpy array of normalized 3D points.
Return type:: np.ndarray

Example

>>> original_node_positions = np.array([
...     [2, 5, -3],
...     [4, -2, 1],
...     [5, 3, -1],
...     [-1, -4, 2]
... ])
>>> normalize_points(original_node_positions)
array([[ 0.          1.         -0.55555556],
       [ 0.44444444 -0.55555556  0.33333333],
       [ 0.66666667  0.55555556 -0.11111111],
       [-0.66666667 -1.          0.55555556]])

mlsimkit.learn.common.mesh.process_mesh(args: Tuple[int, List[str], int | None, int | None, str, int, float | None]) → Data¶

Pre-process a list of mesh files by converting and downsampling them, and convert them to PyTorch Geometric Data format.

Parameters:: args (Tuple[int, List[str], Optional[int], str, int, Optional[float]]) – A tuple containing: - path_num (int): Index of the current file being processed. - path_list (List[str]): List of paths to mesh files. - label (Optional[int]): Label for the mesh data. - output_dir (str): Directory to save the converted and downsampled mesh files. - total_files (int): Total number of files to be processed. - downsample_perc (Optional[float]): Percentage of vertices to keep after downsampling (between 0 and 100).
Returns:: PyTorch Geometric Data object containing the mesh data.
Return type:: torch_geometric.data.Data

Example

>>> mesh_paths = ['mesh1.stl', 'mesh2.stl']
>>> args = (1, mesh_paths, 1, 'output', 2, 50.0)
>>> data = process_mesh(args)
>>> print(data)
Data(x=[1024, 7], edge_attr=[2048, 4], edge_index=[2, 2048], y=[1], mesh_path=['output/50perc_ds/mesh1_50perc_ds.stl', 'output/50perc_ds/mesh2_50perc_ds.stl'], downsample_perc=50.0)

Miscellaneous Utilities¶

The Utils module collects mostly unrelated functions that should eventually be moved into specifically-named modules.

mlsimkit.learn.common.utils.calculate_directional_correctness(prediction_results_df, kpi_idx)¶

mlsimkit.learn.common.utils.calculate_error_metrics(actual, pred, kpi_idx, label, model_name)¶

mlsimkit.learn.common.utils.calculate_mean_stddev(dataset: Dataset, keys: Tuple[str] = 'x', dims: Dict[str, int] | None = None, shapes: Dict[str, Tuple[int, ...]] | None = None, device: str = 'cpu') → Tuple[Dict[str, Tensor], Dict[str, Tensor]]¶

Calculate the mean and std deviation for each specified key in a dataset using an online algorithm.

This function avoids keeping the entire dataset in memory by computing the cumulative mean and standard deviation for each key using “Parallel Algorithm” from https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm).

By default, the first dimension is used for calculations and the second dimension determines the output shape. You may override the default behavior for a specific key by setting dims and/or shape respectively.

Parameters:

dataset (torch.utils.data.Dataset) – The dataset to normalize.
keys (Tuple[str], optional) – The keys to normalize, defaults to (“x”).
dims (Dict[str, int], optional) – Optional dictionary specifying the dimension along which to compute the statistics for each key. If not provided or if a key is not present in the dictionary, the default dimension is 0.
shapes (Dict[str, Union[[], Tuple[int]], optional) – Optional dictionary specifying the target shape for each key after computing the statistics. The default shape is the second dimension (i.e., dataset[0][key].shape[1]). If an empty tuple () or an empty list [] is specified for a key, the tensor associated with that key will be flattened to a scalar (len=0) before returning the statistics.
device – The torch device to conduct the calculations.

Returns:

A tuple containing two dictionaries:

The cumulative means for each key, with the specified target shape.
The cumulative standard deviations for each key, with the specified target shape.

Return type:

Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]]

Examples

>>> # Compute mean and std dev for 'x' key with default dimensions and shapes
>>> means, stds = calculate_mean_stddev(dataset, keys=('x'))

>>> # Compute mean and std dev for 'y' along the third dimension (index 2)
>>> means, stds = calculate_mean_stddev(dataset, keys=('x', 'y'), dims={'y': 2})

>>> # Flatten 'y' before computing mean and std dev
>>> means, stds = calculate_mean_stddev(dataset, keys=('x', 'y'), shapes={'y': []})

>>> # Compute mean and std dev for 'y' along the third dimension (index 2), and return a flattened tensor
>>> means, stds = calculate_mean_stddev(dataset, keys=('x', 'y'), dims={'y': 2}, shapes={'y': []})

mlsimkit.learn.common.utils.get_lr_scheduler(optimizer, config)¶

mlsimkit.learn.common.utils.get_optimizer(model_parameters, config)¶

mlsimkit.learn.common.utils.save_dataset(dataset, path, indices=None)¶

mlsimkit.learn.common.utils.save_loss_plots(config, train_losses, validation_losses, model_name, plot_log=False)¶

mlsimkit.learn.common.utils.save_pred_vs_actual_plot(actuals, preds, labels, kpi_idx, plot_path, model_name='')¶

mlsimkit.learn.common.utils.save_prediction_results(kpi_indices, predictions_dir, mesh_path_lists, actual_dicts, pred_dicts, labels, model_name='', ground_truth_exist=True)¶

Schemas¶

The learning schemas implement common data classes used across the codebase.

class mlsimkit.learn.common.schema.optimizer.LearningRateScheduler(value)

An enumeration.

REDUCDE_LR_ON_PLATEAU = 'reducelronplateau'

STEP = 'step'

class mlsimkit.learn.common.schema.optimizer.OptimizerAlgorithm(value)

An enumeration.

ADAGRAD = 'adagrad'

ADAM = 'adam'

ADAMW = 'adamw'

RMSPROP = 'rmsprop'

SGD = 'sgd'

class mlsimkit.learn.common.schema.optimizer.OptimizerSettings(*, algorithm: OptimizerAlgorithm = OptimizerAlgorithm.ADAMW, weight_decay: Annotated[float, Ge(ge=0)] = 0.01, learning_rate: Annotated[float, Gt(gt=0)] = 0.001, momentum: Annotated[float, Ge(ge=0), Le(le=1)] = 0.9, lr_scheduler: LearningRateScheduler | None = None, decay_rate: Annotated[float, Gt(gt=0), Le(le=1)] = 0.7, step_size: Annotated[int, Ge(ge=1)] = 1, tracking_metric: TrackingMetric = TrackingMetric.MIN, patience_epochs: Annotated[int, Ge(ge=0)] = 100, min_lr: Annotated[float, Ge(ge=0)] = 5e-05)

class Config

title: str = 'Training Optimizer Settings'

_abc_impl = <_abc._abc_data object>

algorithm: OptimizerAlgorithm

decay_rate: float

learning_rate: float

lr_scheduler: LearningRateScheduler | None

min_lr: float

model_config: ClassVar[ConfigDict] = {'title': 'Training Optimizer Settings'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

momentum: float

patience_epochs: int

step_size: int

tracking_metric: TrackingMetric

weight_decay: float

class mlsimkit.learn.common.schema.optimizer.TrackingMetric(value)

An enumeration.

MAX = 'max'

MIN = 'min'

class mlsimkit.learn.common.schema.project.BaseProjectContext(*, outdir: str | None = None, run_id: str | None = None)

Data class for persisting state to disk between mlsimkit-learn commands. Used to chain commands. Safe for multi-processing via ‘accelerate launch`.

_abc_impl = <_abc._abc_data object>

classmethod get(ctx)

classmethod init(ctx, output_dir)

classmethod load(ctx)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

outdir: str | None

run_id: str | None

save(ctx, exist_ok=True)

class mlsimkit.learn.common.schema.training.BaseTrainSettings(*, training_output_dir: str | None = None, epochs: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 5, batch_size: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 4, drop_last: bool = True, seed: int = 0, shuffle_data_each_epoch: bool = True, device: ~mlsimkit.learn.common.schema.training.Device = Device.AUTO, checkpoint_save_interval: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 10, validation_loss_save_interval: ~typing.Annotated[int, ~annotated_types.Ge(ge=1)] = 1, deterministic: bool = False, mixed_precision: ~mlsimkit.learn.common.schema.training.MixedPrecision = MixedPrecision.NO, load_checkpoint: ~mlsimkit.learn.common.schema.training.LoadCheckpointSettings = LoadCheckpointSettings(checkpoint_path=None, best_checkpoint_path=None, loss_path=None), optimizer: ~mlsimkit.learn.common.schema.optimizer.OptimizerSettings = OptimizerSettings(algorithm=<OptimizerAlgorithm.ADAMW: 'adamw'>, weight_decay=0.01, learning_rate=0.001, momentum=0.9, lr_scheduler=None, decay_rate=0.7, step_size=1, tracking_metric=<TrackingMetric.MIN: 'min'>, patience_epochs=100, min_lr=5e-05), empty_cache: bool = False, node_encoder_hidden_size: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, edge_encoder_hidden_size: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, node_message_passing_mlp_hidden_size: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, edge_message_passing_mlp_hidden_size: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, node_decoder_hidden_size: ~typing.Annotated[int | None, ~annotated_types.Ge(ge=1)] = None)

_abc_impl = <_abc._abc_data object>

batch_size: int

checkpoint_save_interval: int

deterministic: bool

device: Device

drop_last: bool

edge_encoder_hidden_size: int | None

edge_message_passing_mlp_hidden_size: int | None

empty_cache: bool

epochs: int

load_checkpoint: LoadCheckpointSettings

mixed_precision: MixedPrecision

model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

node_decoder_hidden_size: int | None

node_encoder_hidden_size: int | None

node_message_passing_mlp_hidden_size: int | None

optimizer: OptimizerSettings

seed: int

classmethod set_defaults(values)

shuffle_data_each_epoch: bool

training_output_dir: str | None

validation_loss_save_interval: int

class mlsimkit.learn.common.schema.training.Device(value)

An enumeration.

AUTO = 'auto'

CPU = 'cpu'

class mlsimkit.learn.common.schema.training.GlobalConditionMethod(value)

An enumeration.

MODEL = 'model'

NODE_FEATURES = 'node_features'

class mlsimkit.learn.common.schema.training.LoadCheckpointSettings(*, checkpoint_path: str | None = None, best_checkpoint_path: str | None = None, loss_path: str | None = None)

class Config

title: str = 'Checkpoint Loading Settings'

_abc_impl = <_abc._abc_data object>

best_checkpoint_path: str | None

checkpoint_path: str | None

loss_path: str | None

model_config: ClassVar[ConfigDict] = {'title': 'Checkpoint Loading Settings'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class mlsimkit.learn.common.schema.training.LossMetric(value)

An enumeration.

MSE = 'mse'

RMSE = 'rmse'

class mlsimkit.learn.common.schema.training.MixedPrecision(value)

An enumeration.

BF16 = 'bf16'

FP16 = 'fp16'

NO = 'no'

class mlsimkit.learn.common.schema.training.PoolingType(value)

An enumeration.

MAX = 'max'

MEAN = 'mean'

`mlsimkit.learn.common` – MLSimKit Learn Common¶

Training¶

Tracking¶

Mesh Utilities¶

Miscellaneous Utilities¶

Schemas¶

AI Surrogate Models in Engineering on AWS

Useful Links

Related Topics

mlsimkit.learn.common – MLSimKit Learn Common¶

Training¶

Tracking¶

Mesh Utilities¶

Miscellaneous Utilities¶

Schemas¶

`mlsimkit.learn.common` – MLSimKit Learn Common¶