Experimental#

Whatif provides an alternate GPT model called LightGPT that leverages PyTorch Lightning to simplify model trainings at scale.

Call for Actions

We welcome and encourage for your feedbacks to help us refine this experimental API.

Standing on the shoulder of PyTorch Lightning allows Whatif to be used out-of-the-box in easy, flexible, scalable and robust manners. At the same time, Whatif automatically benefits from new improvements in future PyTorch Lightning release.

Non-Exhaustive List of New Capabilities

Here’re some of the capabilities that you can immediately access with LightGPT, by virtue of the power of transitivity a2rl -> pytorch-lightning -> capabilities.

You can run LightGPT on a wide range of accelerators which include (as of PyTorch Lightning v1.6.5) CUDA GPU (i.e., the P* and G* Amazon EC2 instances), HPU (i.e., the Amazon DL1 EC2 instances powered by Gaudi accelerators from Intel’s Habana Labs), Google’s TPU, etc.

You can also train LightGPT models on a wide spectrum of system configurations, ranging from single node to multiple nodes. Choose a distributed training algorithms from a wide range of available selections: Horovod, DeepSpeed, FairScale, Bagua, etc.

Instant access to many training techniques such as gradients accumulation, gradient clipping, stochastic weight averaging, batch-size finder, learning-rate finder, 16-bit precision, etc.

LightGPT has first-class supports to log training metrics to a plethora of experiment trackers such as TensorBoard (default behavior), Wandb, Comet, Neptune, etc.

There’re many more capabilities that LightGPT automatically inherits from PyTorch Lightning. For a comprehensive tour of those capabilities, please visit PyTorch Lightning documentation.

LightGPTBuilder

High-level APIs to train and evaluate a Lightning-based GPT model based on the data loaded in AutoTokenizer.