DistModel

class paddle.distributed. DistModel ( layer: Layer, loader: ShardDataloader | DataLoader, loss: Layer | Callable[..., Any] | None = None, optimizer: Optimizer | None = None, strategy: Strategy | None = None, metrics: list[Metric] | None = None, input_spec: list[list[DistributedInputSpec]] | None = None ) [source]

DistModel is the model converted from a paddle.nn.layer with distributed tensors as its parameters. It contains the static graph converted from a paddle.nn.layer whose parameters are distributed tensors (constructed from paddle.distributed.shard_tensor), and provides the APIs for training, evaluation and prediction with the static graph.

It is suggested to generate DistModel by paddle.distributed.to_static, not directly by paddle.distributed.DistModel.

Please first set the DistModel to “train”, “eval” or “predict” mode with train()/eval()/predict() method and then use the __call__ method for training, evaluation and prediction respectively.

For more details of the usage, please refer to the sample code in paddle.distributed.to_static.

Parameters
  • layer (paddle.nn.Layer) – The layer in dygraph mode, whose parameters are distributed tensors generated by shard_tensor.

  • loader (ShardDataLoader|paddle.io.DataLoader) – The data loader used in dygraph mode, used to infer inputs_spec and labels_spec.

  • loss (Loss|Callable|None, optional) – The loss function for training or evaluating the model. Can be a paddle.nn.Layer instance or any callable function. If loss is not None, DistModel will be set to “train” (when the optimizer is also not None) or “eval” mode (when optimizer is None) in default. If it is None, DistModel will be set to “predict” mode in default. Default: None.

  • optimizer (paddle.optimizer.Optimizer|None, optional) – The optimizer for training. If both optimizer and loss are set, DistModel will be set to “train” mode in default. Default: None.

  • strategy (paddle.distributed.Strategy|None, optional) – Configs for parallel strategies and optimization settings (e.g. sharding, pipeline parallelism). Default: None.

  • input_spec (list[list[paddle.distributed.DistributedInputSpec]]|None, optional) – The custom input specs specify the shape, dtype, and name information of model inputs and labels. If it is not None, the input specs and label specs will be inferred from the custom input specs. The custom input specs should be a list containing two sublists: the first sublist represents theinput specs, and the second sublist represents the label specs. Default: None.

train ( ) None

train

Set the DistModel to “train” mode. In “train” mode, executing __call__ method will update the parameters of the model and return the loss.

eval ( ) None

eval

Set the mode of DistModel to “eval”. In “eval” mode, executing __call__ will return the loss.

predict ( ) None

predict

Set the mode of DistModel to “predict”. In “predict” mode, executing __call__ returns a dict that contains the outputs of the model.

dist_main_program ( mode: _Mode | None = None ) Program

dist_main_program

Get the distributed main program of the specified mode. Each ‘mode’ has its own distributed main program, dist_main_program returns the corresponding distributed main program of mode.

Parameters

mode (str|None, optional) – Can be ‘train’ , ‘eval’ , ‘predict’ or None. ‘train’ : Return the distributed main program for training. ‘eval’ : Return the distributed main program for evaluation. ‘predict’ : Return the distributed main program for prediction. None : The current mode of the DistModel will be used. Default : None.

Returns

The distributed main program of mode.

dist_startup_program ( mode: _Mode | None = None ) Program

dist_startup_program

Get the corresponding distributed startup program of mode, which is used for initializing the parameters.

Parameters

mode (str|None, optional) – Can be ‘train’ , ‘eval’ , ‘predict’ or None. ‘train’ : Return the distributed startup program for training. ‘eval’ : Return the distributed startup program for evaluation. ‘predict’ : Return the distributed startup program for prediction. None: The current mode of the DistModel will be used. Default : None.

Returns

The distributed startup program of mode.

serial_main_program ( mode: _Mode | None = None ) Program

serial_main_program

Get the corresponding serial main program of mode, containing the whole variables and operators of the given layer.

Parameters

mode (str|None, optional) – Can be ‘train’, ‘eval’, ‘predict’ or None. ‘train’ : Return the main program for training. ‘eval’ : Return the main program for evaluation. ‘predict’ : Return the main program for prediction. None : The current mode of the DistModel will be used. Default : None.

Returns

The serial main program of mode.

serial_startup_program ( mode: _Mode | None = None ) Program

serial_startup_program

Get the corresponding serial startup program of mode.

Parameters

mode (str|None, optional) – Can be ‘train’ , ‘eval’ , ‘predict’ or None. ‘train’ : Return the serial startup program for training. ‘eval’ : Return the serial startup program for evaluation. ‘predict’ : Return the serial startup program for prediction. None : The current mode of the DistModel will be used. Default : None.

Returns

The serial startup program of mode.

state_dict ( mode: Literal[opt, param, all] = 'all', split_fusion: bool = True ) dict[str, Tensor]

state_dict

Get the state dict of model and optimizer.

Parameters

mode (str) – Can be [‘opt’, ‘param’, ‘all’], ‘opt’ : The return value only contains the variable in the optimizer. ‘param’ : The return value only contains the variable in the network, not the variable in the optimizer. ‘all’ : The return value contains the variable in the network and optimizer. Default: ‘all’