DistModel

class paddle.distributed. DistModel ( layer, loader, loss=None, optimizer=None, strategy=None, metrics=None ) [source]

DistModel is the model converted from a paddle.nn.layer with distributed tensors as its parameters. It contains the static graph converted from a paddle.nn.layer whose parameters are distributed tensors (constructed from paddle.distributed.shard_tensor), and provides the APIs for training, evaluation and prediction with the static graph.

It is suggested to generate DistModel by paddle.distributed.to_static, not directly by paddle.distributed.DistModel.

Please first set the DistModel to “train”, “eval” or “predict” mode with train()/eval()/predict() method and then use the __call__ method for training, evaluation and prediction respectively.

For more details of the usage, please refer to the sample code in paddle.distributed.to_static.

Parameters
  • layer (paddle.nn.Layer) – The layer in dygraph mode, whose parameters are distributed tensors generated by shard_tensor.

  • loader (ShardDataLoader|paddle.io.DataLoader) – The data loader used in dygraph mode, used to infer inputs_spec and labels_spec.

  • loss (Loss|Callable|None, optional) – The loss function for training or evaluating the model. Can be a paddle.nn.Layer instance or any callable function. If loss is not None, DistModel will be set to “train” (when the optimizer is also not None) or “eval” mode (when optimizer is None) in default. If it is None, DistModel will be set to “predict” mode in default. Default: None.

  • optimizer (paddle.optimizer.Optimizer|None, optional) – The optimizer for training. If both optimizer and loss are set, DistModel will be set to “train” mode in default. Default: None.

  • strategy (paddle.distributed.Strategy|None, optional) – Configs for parallel strategies and optimization settings (e.g. sharding, pipeline parallelism). Default: None.

train ( )

train

Set the DistModel to “train” mode. In “train” mode, executing __call__ method will update the parameters of the model and return the loss.

eval ( )

eval

Set the mode of DistModel to “eval”. In “eval” mode, executing __call__ will return the loss.

predict ( )

predict

Set the mode of DistModel to “predict”. In “predict” mode, executing __call__ returns a dict that contains the outputs of the model.

dist_main_program ( mode=None )

dist_main_program

Get the distributed main program of the specified mode. Each ‘mode’ has its own distributed main program, dist_main_program returns the corresponding distributed main program of mode.

Parameters

mode (str|None, optional) – Can be ‘train’ , ‘eval’ , ‘predict’ or None. ‘train’ : Return the distributed main program for training. ‘eval’ : Return the distributed main program for evaluation. ‘predict’ : Return the distributed main program for prediction. None : The current mode of the DistModel will be used. Default : None.

Returns

The distributed main program of mode.

dist_startup_program ( mode=None )

dist_startup_program

Get the corresponding distributed startup program of mode, which is used for initializing the parameters.

Parameters

mode (str|None, optional) – Can be ‘train’ , ‘eval’ , ‘predict’ or None. ‘train’ : Return the distributed startup program for training. ‘eval’ : Return the distributed startup program for evaluation. ‘predict’ : Return the distributed startup program for prediction. None: The current mode of the DistModel will be used. Default : None.

Returns

The distributed startup program of mode.

serial_main_program ( mode=None )

serial_main_program

Get the corresponding serial main program of mode, containing the whole variables and operators of the given layer.

Parameters

mode (str|None, optional) – Can be ‘train’, ‘eval’, ‘predict’ or None. ‘train’ : Return the main program for training. ‘eval’ : Return the main program for evaluation. ‘predict’ : Return the main program for prediction. None : The current mode of the DistModel will be used. Default : None.

Returns

The serial main program of mode.

serial_startup_program ( mode=None )

serial_startup_program

Get the corresponding serial startup program of mode.

Parameters

mode (str|None, optional) – Can be ‘train’ , ‘eval’ , ‘predict’ or None. ‘train’ : Return the serial startup program for training. ‘eval’ : Return the serial startup program for evaluation. ‘predict’ : Return the serial startup program for prediction. None : The current mode of the DistModel will be used. Default : None.

Returns

The serial startup program of mode.

state_dict ( mode='all' )

state_dict

Get the state dict of model and optimizer.

Parameters

mode (str) – Can be [‘opt’, ‘param’, ‘all’], ‘opt’ : The return value only contains the variable in the optimizer. ‘param’ : The return value only contains the variable in the network, not the variable in the optimizer. ‘all’ : The return value contains the variable in the network and optimizer. Default: ‘all’