slapo.random

Random seed and states management.

Classes:

CudaRNGStatesTracker()

Tracker for the cuda RNG states.

Functions:

get_cuda_rng_tracker()

Get cuda rng tracker.

model_parallel_cuda_manual_seed(seed, ...)

Initialize model parallel cuda seed.

is_random_seed_set()

Check if random seed is set.

set_random_seed([seed, dp_rank, pp_rank, ...])

Set random seed for reproducibility.

class slapo.random.CudaRNGStatesTracker[source]

Tracker for the cuda RNG states. Using the add method, a cuda rng state is initialized based on the input seed and is assigned to name. Later, by forking the rng state, we can perform operations and return to our starting cuda state.

Methods:

reset()

Set to the initial state (no tracker).

get_states()

Get rng states.

set_states(states)

Set the rng states.

add(name, seed)

Track the rng state.

fork([name])

Fork the cuda rng state, perform operations, and exit with the original state.

reset()[source]

Set to the initial state (no tracker).

get_states()[source]

Get rng states. Copy the dictionary so we have direct pointers to the states, not just a pointer to the dictionary.

set_states(states)[source]

Set the rng states. For efficiency purposes, we do not check the size of seed for compatibility.

add(name, seed)[source]

Track the rng state.

fork(name='model-parallel-rng')[source]

Fork the cuda rng state, perform operations, and exit with the original state.

slapo.random.get_cuda_rng_tracker()[source]

Get cuda rng tracker.

slapo.random.model_parallel_cuda_manual_seed(seed, tp_rank, always_enable_tp_seed)[source]

Initialize model parallel cuda seed. This function should be called after the model parallel is initialized. Also, no torch.cuda.manual_seed should be called after this function. Basically, this is replacement for that function. Two sets of RNG states are tracked:

  • default state: This is for data parallelism and is the same among a

    set of model parallel GPUs but different across different model parallel groups. This is used for example for dropout in the non-tensor-model-parallel regions.

  • tensor-model-parallel state: This state is different among a set of model

    parallel GPUs, but the same across data parallel groups. This is used for example for dropout in model parallel regions.

Parameters
  • seed (int) – Random seed.

  • tp_rank (int) – Tensor model parallel rank.

  • always_enable_tp_seed (bool) – Always enable tensor model parallel seed. This is used when sequence parallelism is enabled and all dropouts should use different seeds even they are in the same TP group. Default is False, meaning that tensor model parallel seed is only enabled with get_cuda_rng_tracker().fork().

Returns

Tensor model parallel seed of this rank.

Return type

int

slapo.random.is_random_seed_set()[source]

Check if random seed is set.

slapo.random.set_random_seed(seed=2013, dp_rank=None, pp_rank=None, tp_rank=None, always_enable_tp_seed=False)[source]

Set random seed for reproducibility.

Parameters
  • seed (int) – Random seed. Default is 2013.

  • dp_rank (Optional[int]) – Data parallel rank. Default is None means no data parallelism.

  • pp_rank (Optional[int]) – Pipeline parallel rank. Default is None means no pipeline parallelism.

  • tp_rank (Optional[int]) – Tensor model parallel rank. Default is None means no tensor parallelism.

  • always_enable_tp_seed (bool) – Always enable tensor model parallel seed. This is used when sequence parallelism is enabled and all dropouts should use different seeds even they are in the same TP group. Default is False, meaning that tensor model parallel seed is only enabled with get_cuda_rng_tracker().fork().

Returns

Random seed of this rank.

Return type

int