API references

class pyrannc.RaNNCModule(model, optimizer=None, gather_inputs=True, load_deployment=None, enable_apex_amp=False, allreduce_amp_master_params=False, enable_zero=False, check_unused_values=True, offload_params=False)

Computes a PyTorch model on multiple GPUs with a hybrid parallelism.

Parameters
  • model – Model to distribute.

  • optimizer – Optimizer that should work with RaNNC.

  • gather_inputs – Set False if model uses inputs given on rank 0.

  • enable_apex_amp – Set True if model is processed by Apex AMP.

  • allreduce_amp_master_params – Set True to allreduce gradients of master parameters of Apex AMP.

  • enable_zero – Set True to remove the redundancy of optimizer states following the approach of DeepSpeed.

  • check_unused_values – If True, RaNNC throws an exception when it finds unused values in a computation graph.

  • offload_params – If True, parameters are moved to host memory until they are used.

buffers(*args, **kwargs)

Returns buffers. Note that buffers are not synchronized among ranks.

clip_grad_norm(max_grad_norm)

Clips gradients according to the norm. Use this method to clip gradients insted of torch.nn.utils.clip_grad_norm_ because each local process only has a part of parameters/gradients. This method calculates norm of all distributed gradients and clips them.

Parameters

max_grad_norm – Max value of gradients’ norm.

Note

This method must be called from all ranks.

cuda(*args, **kwargs)

This does not work because the device placement of a RaNNCModule is controlled by RaNNC.

enable_dropout(enable)

Enables/disables dropout layers. This method is useful for evaluation because model.eval() does not work for a RaNNCModule.

Parameters

enable – Set True to enable and False to disable dropout layers.

eval()

Sets the training mode to False (i.e. evaluation mode).

get_param(name, amp_master_param=False)

Gets a parameter tensor specified by name.

Parameters
  • args – Name of a parameter.

  • amp_master_param – Gets Apex amp master parameter if True.

get_param_grad(name, amp_master_param=False)

Gets the gradient of a parameter tensor specified by name.

Parameters
  • args – Name of a parameter.

  • amp_master_param – Gets Apex amp master gradient if True.

load_state_dict(*args, **kwargs)

Load state_dict to the model. This works only before the first call of forward pass.

Parameters
  • args – Passed to the original model.

  • kwargs – Passed to the original model.

Returns

Return value of the original model’s state_dict.

named_buffers(*args, **kwargs)

Returns buffers with their names. Note that buffers are not synchronized among ranks.

named_parameters(*args, **kwargs)

Returns parameters with their names. Note that parameters are not synchronized among ranks.

parameters(*args, **kwargs)

Returns parameters. Note that parameters are not synchronized among ranks.

save_deployment(file)

Saves a deployment state (graph partitioning) to file.

Parameters

file – File path.

state_dict(*args, no_hook=False, amp_master_params=True, rank0_only=True, **kwargs)

Returns state_dict of the model.

Parameters
  • no_hook – If True, hooks on state_dict of the original models are ignored.

  • amp_master_params – Set True to get apex amp master params.

Note

This method must be called from all ranks.

to(*args, **kwargs)

This does not work because the device placement of a RaNNCModule is controlled by RaNNC.

train(mode=True)

Outputs warning because a RaNNC module cannot change the grad mode.

Parameters

mode – Training mode.

undeploy()

Undeploys a model distributed on GPUs. This frees GPU memory used for the model.

Note

This method must be called from all ranks.

zero_grad()

Sets zeros to gradients of model parameters.

pyrannc.barrier()

Blocks until all ranks reaches the call of this method.

pyrannc.clear()

Clear RaNNC’s state including all RaNNCModules and buffers

pyrannc.delay_grad_allreduce(delay)

As default, RaNNC performs allreduce of gradients soon after backward. If True is given, however, it skips the allreduce. The application can use allreduce_grads to explicitly perform allreduce. This is useful when the gradient accumulation is used.

Parameters

delay – If True, allreduce after backward is skipped.

pyrannc.get_rank()

Get rank of the running process in COMM_WORLD.

Returns

rank

pyrannc.get_world_size()

Get the size of COMM_WORLD.

Returns

world size

pyrannc.keep_graph(keep)

The flag is passed to retain_graph of PyTorch’s backward. This is useful when you perform multiple backward passes after one forward pass.

Parameters

keep – Set True to keep graph after backward.

pyrannc.recreate_all_communicators()

Destroy and recreate all communicators.

pyrannc.show_deployment(path, batch_size)

Show a deployment (Subgraphs and micro-batch sizes in pipeline parallelism) saved in a file. This is used for debugging.

Parameters
  • path – Path to a deployment file.

  • batch_size – Global batch size.

pyrannc.sync_params_on_init(sync)

As default, RaNNC synchronizes model parameters on initialization. This aims to use same initial values of parameters on all ranks, but often takes a long time. You can skip the synchronization by passing False to this method when you use the same random seed or other libraries to synchronize parameters.

Parameters

sync – Set False to skip parameter synchronization.