API references
- class pyrannc.RaNNCModule(model, optimizer=None, gather_inputs=True, load_deployment=None, enable_apex_amp=False, allreduce_amp_master_params=False, enable_zero=False, check_unused_values=True, offload_params=False)
Computes a PyTorch model on multiple GPUs with a hybrid parallelism.
- Parameters
model – Model to distribute.
optimizer – Optimizer that should work with RaNNC.
gather_inputs – Set
False
if model uses inputs given on rank 0.enable_apex_amp – Set
True
ifmodel
is processed by Apex AMP.allreduce_amp_master_params – Set
True
to allreduce gradients of master parameters of Apex AMP.enable_zero – Set
True
to remove the redundancy of optimizer states following the approach of DeepSpeed.check_unused_values – If
True
, RaNNC throws an exception when it finds unused values in a computation graph.offload_params – If
True
, parameters are moved to host memory until they are used.
- buffers(*args, **kwargs)
Returns buffers. Note that buffers are not synchronized among ranks.
- clip_grad_norm(max_grad_norm)
Clips gradients according to the norm. Use this method to clip gradients insted of
torch.nn.utils.clip_grad_norm_
because each local process only has a part of parameters/gradients. This method calculates norm of all distributed gradients and clips them.- Parameters
max_grad_norm – Max value of gradients’ norm.
Note
This method must be called from all ranks.
- cuda(*args, **kwargs)
This does not work because the device placement of a
RaNNCModule
is controlled by RaNNC.
- enable_dropout(enable)
Enables/disables dropout layers. This method is useful for evaluation because model.eval() does not work for a RaNNCModule.
- Parameters
enable – Set
True
to enable andFalse
to disable dropout layers.
- eval()
Sets the training mode to
False
(i.e. evaluation mode).
- get_param(name, amp_master_param=False)
Gets a parameter tensor specified by
name
.- Parameters
args – Name of a parameter.
amp_master_param – Gets Apex amp master parameter if
True
.
- get_param_grad(name, amp_master_param=False)
Gets the gradient of a parameter tensor specified by
name
.- Parameters
args – Name of a parameter.
amp_master_param – Gets Apex amp master gradient if
True
.
- load_state_dict(*args, **kwargs)
Load
state_dict
to the model. This works only before the first call of forward pass.- Parameters
args – Passed to the original model.
kwargs – Passed to the original model.
- Returns
Return value of the original model’s
state_dict
.
- named_buffers(*args, **kwargs)
Returns buffers with their names. Note that buffers are not synchronized among ranks.
- named_parameters(*args, **kwargs)
Returns parameters with their names. Note that parameters are not synchronized among ranks.
- parameters(*args, **kwargs)
Returns parameters. Note that parameters are not synchronized among ranks.
- save_deployment(file)
Saves a deployment state (graph partitioning) to file.
- Parameters
file – File path.
- state_dict(*args, no_hook=False, amp_master_params=True, rank0_only=True, **kwargs)
Returns
state_dict
of the model.- Parameters
no_hook – If
True
, hooks onstate_dict
of the original models are ignored.amp_master_params – Set
True
to get apex amp master params.
Note
This method must be called from all ranks.
- to(*args, **kwargs)
This does not work because the device placement of a
RaNNCModule
is controlled by RaNNC.
- train(mode=True)
Outputs warning because a RaNNC module cannot change the grad mode.
- Parameters
mode – Training mode.
- undeploy()
Undeploys a model distributed on GPUs. This frees GPU memory used for the model.
Note
This method must be called from all ranks.
- zero_grad()
Sets zeros to gradients of model parameters.
- pyrannc.barrier()
Blocks until all ranks reaches the call of this method.
- pyrannc.clear()
Clear RaNNC’s state including all RaNNCModules and buffers
- pyrannc.delay_grad_allreduce(delay)
As default, RaNNC performs allreduce of gradients soon after
backward
. IfTrue
is given, however, it skips the allreduce. The application can useallreduce_grads
to explicitly perform allreduce. This is useful when the gradient accumulation is used.- Parameters
delay – If
True
, allreduce after backward is skipped.
- pyrannc.get_rank()
Get rank of the running process in
COMM_WORLD
.- Returns
rank
- pyrannc.get_world_size()
Get the size of
COMM_WORLD
.- Returns
world size
- pyrannc.keep_graph(keep)
The flag is passed to
retain_graph
of PyTorch’s backward. This is useful when you perform multiple backward passes after one forward pass.- Parameters
keep – Set
True
to keep graph after backward.
- pyrannc.recreate_all_communicators()
Destroy and recreate all communicators.
- pyrannc.show_deployment(path, batch_size)
Show a deployment (Subgraphs and micro-batch sizes in pipeline parallelism) saved in a file. This is used for debugging.
- Parameters
path – Path to a deployment file.
batch_size – Global batch size.
- pyrannc.sync_params_on_init(sync)
As default, RaNNC synchronizes model parameters on initialization. This aims to use same initial values of parameters on all ranks, but often takes a long time. You can skip the synchronization by passing
False
to this method when you use the same random seed or other libraries to synchronize parameters.- Parameters
sync – Set
False
to skip parameter synchronization.