API references

class pyrannc.RaNNCModule(model, optimizer=None, gather_inputs=True, load_deployment=None, enable_apex_amp=False, allreduce_amp_master_params=False, enable_zero=False, check_unused_values=True, offload_params=False)

Computes a PyTorch model on multiple GPUs with a hybrid parallelism.

Parameters

model – Model to distribute.
optimizer – Optimizer that should work with RaNNC.
gather_inputs – Set False if model uses inputs given on rank 0.
enable_apex_amp – Set True if model is processed by Apex AMP.
allreduce_amp_master_params – Set True to allreduce gradients of master parameters of Apex AMP.
enable_zero – Set True to remove the redundancy of optimizer states following the approach of DeepSpeed.
check_unused_values – If True, RaNNC throws an exception when it finds unused values in a computation graph.
offload_params – If True, parameters are moved to host memory until they are used.

buffers(*args, **kwargs): Returns buffers. Note that buffers are not synchronized among ranks.

clip_grad_norm(max_grad_norm)

Clips gradients according to the norm. Use this method to clip gradients insted of torch.nn.utils.clip_grad_norm_ because each local process only has a part of parameters/gradients. This method calculates norm of all distributed gradients and clips them.

Parameters: max_grad_norm – Max value of gradients’ norm.

Note

This method must be called from all ranks.

cuda(*args, **kwargs): This does not work because the device placement of a RaNNCModule is controlled by RaNNC.

enable_dropout(enable)

Enables/disables dropout layers. This method is useful for evaluation because model.eval() does not work for a RaNNCModule.

Parameters: enable – Set True to enable and False to disable dropout layers.

eval(): Sets the training mode to False (i.e. evaluation mode).

get_param(name, amp_master_param=False)

Gets a parameter tensor specified by name.

Parameters

args – Name of a parameter.
amp_master_param – Gets Apex amp master parameter if True.

get_param_grad(name, amp_master_param=False)

Gets the gradient of a parameter tensor specified by name.

Parameters

args – Name of a parameter.
amp_master_param – Gets Apex amp master gradient if True.

load_state_dict(*args, **kwargs)

Load state_dict to the model. This works only before the first call of forward pass.

Parameters

args – Passed to the original model.
kwargs – Passed to the original model.

Returns

Return value of the original model’s state_dict.

named_buffers(*args, **kwargs): Returns buffers with their names. Note that buffers are not synchronized among ranks.

named_parameters(*args, **kwargs): Returns parameters with their names. Note that parameters are not synchronized among ranks.

parameters(*args, **kwargs): Returns parameters. Note that parameters are not synchronized among ranks.

save_deployment(file)

Saves a deployment state (graph partitioning) to file.

Parameters: file – File path.

state_dict(*args, no_hook=False, amp_master_params=True, rank0_only=True, **kwargs)

Returns state_dict of the model.

Parameters

no_hook – If True, hooks on state_dict of the original models are ignored.
amp_master_params – Set True to get apex amp master params.

Note

This method must be called from all ranks.

to(*args, **kwargs): This does not work because the device placement of a RaNNCModule is controlled by RaNNC.

train(mode=True)

Outputs warning because a RaNNC module cannot change the grad mode.

Parameters: mode – Training mode.

undeploy(): Undeploys a model distributed on GPUs. This frees GPU memory used for the model.

Note

This method must be called from all ranks.

zero_grad(): Sets zeros to gradients of model parameters.

pyrannc.barrier(): Blocks until all ranks reaches the call of this method.

pyrannc.clear(): Clear RaNNC’s state including all RaNNCModules and buffers

pyrannc.delay_grad_allreduce(delay)

As default, RaNNC performs allreduce of gradients soon after backward. If True is given, however, it skips the allreduce. The application can use allreduce_grads to explicitly perform allreduce. This is useful when the gradient accumulation is used.

Parameters: delay – If True, allreduce after backward is skipped.

pyrannc.get_rank()

Get rank of the running process in COMM_WORLD.

Returns: rank

pyrannc.get_world_size()

Get the size of COMM_WORLD.

Returns: world size

pyrannc.keep_graph(keep)

The flag is passed to retain_graph of PyTorch’s backward. This is useful when you perform multiple backward passes after one forward pass.

Parameters: keep – Set True to keep graph after backward.

pyrannc.recreate_all_communicators(): Destroy and recreate all communicators.

pyrannc.show_deployment(path, batch_size)

Show a deployment (Subgraphs and micro-batch sizes in pipeline parallelism) saved in a file. This is used for debugging.

Parameters

path – Path to a deployment file.
batch_size – Global batch size.

pyrannc.sync_params_on_init(sync)

As default, RaNNC synchronizes model parameters on initialization. This aims to use same initial values of parameters on all ranks, but often takes a long time. You can skip the synchronization by passing False to this method when you use the same random seed or other libraries to synchronize parameters.

Parameters: sync – Set False to skip parameter synchronization.