Experimental#
Whatif
provides an alternate GPT model called LightGPT
that leverages PyTorch Lightning to simplify model trainings at scale.
Call for Actions
We welcome and encourage for your feedbacks to help us refine this experimental API.
Standing on the shoulder of PyTorch Lightning allows Whatif
to be used out-of-the-box in
easy, flexible, scalable and robust manners. At the same time, Whatif
automatically benefits
from new improvements in future PyTorch Lightning release.
Non-Exhaustive List of New Capabilities
Here’re some of the capabilities that you can immediately access with LightGPT
, by virtue of
the power of transitivity a2rl -> pytorch-lightning -> capabilities.
You can run LightGPT
on a wide range of accelerators which include (as of
PyTorch Lightning v1.6.5)
CUDA GPU (i.e., the P*
and G*
Amazon EC2 instances), HPU (i.e., the Amazon DL1
EC2
instances powered by Gaudi accelerators from Intel’s Habana Labs), Google’s TPU, etc.
You can also train LightGPT
models on a wide spectrum of system configurations, ranging from
single node to multiple nodes. Choose a
distributed training algorithms from a wide range of available
selections: Horovod,
DeepSpeed,
FairScale,
Bagua, etc.
Instant access to many training techniques such as gradients accumulation, gradient clipping, stochastic weight averaging, batch-size finder, learning-rate finder, 16-bit precision, etc.
LightGPT
has first-class supports to log training metrics to a plethora of
experiment trackers such as
TensorBoard (default behavior),
Wandb, Comet, Neptune, etc.
There’re many more capabilities that LightGPT
automatically inherits from PyTorch Lightning.
For a comprehensive tour of those capabilities, please visit
PyTorch Lightning documentation.
High-level APIs to train and evaluate a Lightning-based GPT model based on the data loaded in |