DistModel¶
- class paddle.distributed. DistModel ( layer, loader, loss=None, optimizer=None, strategy=None, metrics=None ) [source]
-
DistModel is the model converted from a
paddle.nn.layer
with distributed tensors as its parameters. It contains the static graph converted from apaddle.nn.layer
whose parameters are distributed tensors (constructed frompaddle.distributed.shard_tensor
), and provides the APIs for training, evaluation and prediction with the static graph.It is suggested to generate DistModel by
paddle.distributed.to_static
, not directly bypaddle.distributed.DistModel
.Please first set the DistModel to “train”, “eval” or “predict” mode with
train()/eval()/predict()
method and then use the__call__
method for training, evaluation and prediction respectively.For more details of the usage, please refer to the sample code in
paddle.distributed.to_static
.- Parameters
-
layer (paddle.nn.Layer) – The layer in dygraph mode, whose parameters are distributed tensors generated by
shard_tensor
.loader (ShardDataLoader|paddle.io.DataLoader) – The data loader used in dygraph mode, used to infer inputs_spec and labels_spec.
loss (Loss|Callable|None, optional) – The loss function for training or evaluating the model. Can be a paddle.nn.Layer instance or any callable function. If loss is not None, DistModel will be set to “train” (when the optimizer is also not None) or “eval” mode (when optimizer is None) in default. If it is None, DistModel will be set to “predict” mode in default. Default: None.
optimizer (paddle.optimizer.Optimizer|None, optional) – The optimizer for training. If both optimizer and loss are set, DistModel will be set to “train” mode in default. Default: None.
strategy (paddle.distributed.Strategy|None, optional) – Configs for parallel strategies and optimization settings (e.g. sharding, pipeline parallelism). Default: None.
-
train
(
)
train¶
-
Set the DistModel to “train” mode. In “train” mode, executing
__call__
method will update the parameters of the model and return the loss.
-
eval
(
)
eval¶
-
Set the mode of DistModel to “eval”. In “eval” mode, executing
__call__
will return the loss.
-
predict
(
)
predict¶
-
Set the mode of DistModel to “predict”. In “predict” mode, executing
__call__
returns a dict that contains the outputs of the model.
-
dist_main_program
(
mode=None
)
dist_main_program¶
-
Get the distributed main program of the specified
mode
. Each ‘mode’ has its own distributed main program,dist_main_program
returns the corresponding distributed main program ofmode
.- Parameters
-
mode (str|None, optional) – Can be ‘train’ , ‘eval’ , ‘predict’ or None. ‘train’ : Return the distributed main program for training. ‘eval’ : Return the distributed main program for evaluation. ‘predict’ : Return the distributed main program for prediction. None : The current mode of the DistModel will be used. Default : None.
- Returns
-
The distributed main program of
mode
.
-
dist_startup_program
(
mode=None
)
dist_startup_program¶
-
Get the corresponding distributed startup program of
mode
, which is used for initializing the parameters.- Parameters
-
mode (str|None, optional) – Can be ‘train’ , ‘eval’ , ‘predict’ or None. ‘train’ : Return the distributed startup program for training. ‘eval’ : Return the distributed startup program for evaluation. ‘predict’ : Return the distributed startup program for prediction. None: The current mode of the DistModel will be used. Default : None.
- Returns
-
The distributed startup program of
mode
.
-
serial_main_program
(
mode=None
)
serial_main_program¶
-
Get the corresponding serial main program of
mode
, containing the whole variables and operators of the givenlayer
.- Parameters
-
mode (str|None, optional) – Can be ‘train’, ‘eval’, ‘predict’ or None. ‘train’ : Return the main program for training. ‘eval’ : Return the main program for evaluation. ‘predict’ : Return the main program for prediction. None : The current mode of the DistModel will be used. Default : None.
- Returns
-
The serial main program of
mode
.
-
serial_startup_program
(
mode=None
)
serial_startup_program¶
-
Get the corresponding serial startup program of
mode
.- Parameters
-
mode (str|None, optional) – Can be ‘train’ , ‘eval’ , ‘predict’ or None. ‘train’ : Return the serial startup program for training. ‘eval’ : Return the serial startup program for evaluation. ‘predict’ : Return the serial startup program for prediction. None : The current mode of the DistModel will be used. Default : None.
- Returns
-
The serial startup program of
mode
.
-
state_dict
(
mode='all'
)
state_dict¶
-
Get the state dict of model and optimizer.
- Parameters
-
mode (str) – Can be [‘opt’, ‘param’, ‘all’], ‘opt’ : The return value only contains the variable in the optimizer. ‘param’ : The return value only contains the variable in the network, not the variable in the optimizer. ‘all’ : The return value contains the variable in the network and optimizer. Default: ‘all’