DistModel

DistModel is the model converted from a paddle.nn.layer with distributed tensors as its parameters. It contains the static graph converted from a paddle.nn.layer whose parameters are distributed tensors (constructed from paddle.distributed.shard_tensor), and provides the APIs for training, evaluation and prediction with the static graph.

It is suggested to generate DistModel by paddle.distributed.to_static, not directly by paddle.distributed.DistModel.

Please first set the DistModel to “train”, “eval” or “predict” mode with train()/eval()/predict() method and then use the __call__ method for training, evaluation and prediction respectively.

For more details of the usage, please refer to the sample code in paddle.distributed.to_static.

Parameters

layer (paddle.nn.Layer) – The layer in dygraph mode, whose parameters are distributed tensors generated by shard_tensor.
loader (ShardDataLoader|paddle.io.DataLoader) – The data loader used in dygraph mode, used to infer inputs_spec and labels_spec.
loss (Loss|Callable|None, optional) – The loss function for training or evaluating the model. Can be a paddle.nn.Layer instance or any callable function. If loss is not None, DistModel will be set to “train” (when the optimizer is also not None) or “eval” mode (when optimizer is None) in default. If it is None, DistModel will be set to “predict” mode in default. Default: None.
optimizer (paddle.optimizer.Optimizer|None, optional) – The optimizer for training. If both optimizer and loss are set, DistModel will be set to “train” mode in default. Default: None.
strategy (paddle.distributed.Strategy|None, optional) – Configs for parallel strategies and optimization settings (e.g. sharding, pipeline parallelism). Default: None.
input_spec (list[list[paddle.distributed.DistributedInputSpec]]|None, optional) – The custom input specs specify the shape, dtype, and name information of model inputs and labels. If it is not None, the input specs and label specs will be inferred from the custom input specs. The custom input specs should be a list containing two sublists: the first sublist represents theinput specs, and the second sublist represents the label specs. Default: None.

train ( ) → None train¶: Set the DistModel to “train” mode. In “train” mode, executing __call__ method will update the parameters of the model and return the loss.

eval ( ) → None eval¶: Set the mode of DistModel to “eval”. In “eval” mode, executing __call__ will return the loss.

predict ( ) → None predict¶: Set the mode of DistModel to “predict”. In “predict” mode, executing __call__ returns a dict that contains the outputs of the model.

dist_main_program ( mode: _Mode | None = None ) → Program dist_main_program¶

Get the distributed main program of the specified mode. Each ‘mode’ has its own distributed main program, dist_main_program returns the corresponding distributed main program of mode.

Parameters: mode (str|None, optional) – Can be ‘train’ , ‘eval’ , ‘predict’ or None. ‘train’ : Return the distributed main program for training. ‘eval’ : Return the distributed main program for evaluation. ‘predict’ : Return the distributed main program for prediction. None : The current mode of the DistModel will be used. Default : None.
Returns: The distributed main program of mode.

dist_startup_program ( mode: _Mode | None = None ) → Program dist_startup_program¶

Get the corresponding distributed startup program of mode, which is used for initializing the parameters.

Parameters: mode (str|None, optional) – Can be ‘train’ , ‘eval’ , ‘predict’ or None. ‘train’ : Return the distributed startup program for training. ‘eval’ : Return the distributed startup program for evaluation. ‘predict’ : Return the distributed startup program for prediction. None: The current mode of the DistModel will be used. Default : None.
Returns: The distributed startup program of mode.

serial_main_program ( mode: _Mode | None = None ) → Program serial_main_program¶

Get the corresponding serial main program of mode, containing the whole variables and operators of the given layer.

Parameters: mode (str|None, optional) – Can be ‘train’, ‘eval’, ‘predict’ or None. ‘train’ : Return the main program for training. ‘eval’ : Return the main program for evaluation. ‘predict’ : Return the main program for prediction. None : The current mode of the DistModel will be used. Default : None.
Returns: The serial main program of mode.

serial_startup_program ( mode: _Mode | None = None ) → Program serial_startup_program¶

Get the corresponding serial startup program of mode.

Parameters: mode (str|None, optional) – Can be ‘train’ , ‘eval’ , ‘predict’ or None. ‘train’ : Return the serial startup program for training. ‘eval’ : Return the serial startup program for evaluation. ‘predict’ : Return the serial startup program for prediction. None : The current mode of the DistModel will be used. Default : None.
Returns: The serial startup program of mode.

state_dict ( mode: Literal['opt', 'param', 'all'] = 'all', split_fusion: bool = True ) → dict[str, Tensor] state_dict¶

Get the state dict of model and optimizer.

Parameters: mode (str) – Can be [‘opt’, ‘param’, ‘all’], ‘opt’ : The return value only contains the variable in the optimizer. ‘param’ : The return value only contains the variable in the network, not the variable in the optimizer. ‘all’ : The return value contains the variable in the network and optimizer. Default: ‘all’

DistModel

train¶

eval¶

predict¶

dist_main_program¶

dist_startup_program¶

serial_main_program¶

serial_startup_program¶

state_dict¶