Strategy¶
- class paddle.distributed. Strategy ( config=None ) [source]
-
The Strategy object is used to configure the parallelization and optimization strategies for static graph. Currently contains configuring
sharding
,fused_passes
,gradient_merge
andpipline
. More strategies will be supported in the future.sharding
is used to cnofigure the sharding states of the optimizer, for saving the GPU memory.fused_passes
is used to configure the fusion of the computation in the model.gradient_merge
is used to configure the gradient merge strategy in training.pipeline
is used to configure the pipeline parallelism strategy.- Parameters
-
config (dict|None, optional) – If
config
is None, the defaultdict (configurations will be set. If it is a) –
inside (the itmes) –
configurations (the dict will be used to set the) –
remain (the others) –
values. (the default) –
Examples
>>> import paddle >>> import paddle.distributed as dist >>> strategy = dist.Strategy() >>> strategy.sharding.enable = True >>> strategy.sharding.stage = 2 >>> strategy.sharding.degree = 2 >>> strategy.gradient_merge.enable = True >>> strategy.gradient_merge.k_steps = 2 >>> strategy.gradient_merge.avg = False >>> strategy.pipeline.enable = True >>> strategy.pipeline.schedule_mode = "1F1B" # default is "1F1B" >>> strategy.pipeline.micro_batch_size = 2
- property sharding [source]
-
sharding
is used to cnofigure the sharding states of the optimizer, containing following configs:enable
(bool): whether to enable sharding. Default: False.stage
(int): can be set to 1, 2 or 3. 1 indicates the optimizer states segmentation, 2 indicates optimizer states and gradient segmentation, 3 indicates the segmentation of optimizer states, gradient and parameters. Default: 1.degree
(int): the number of segmentation pieces. Default: 8.Examples
>>> import paddle >>> import paddle.distributed as dist >>> strategy = dist.Strategy() >>> strategy.sharding.enable = True >>> strategy.sharding.stage = 2 >>> strategy.sharding.degree = 2
- property gradient_merge
-
gradient_merge
is used to configure the gradient merge strategy in training, containing following configs:enable
(bool): whether to enable gradient merge. Default: False.k_steps
(int): the number of steps for merging gradients. Default: 1.avg
(bool): whether to average the gradients of each step. Default: True.Examples
>>> import paddle >>> import paddle.distributed as dist >>> strategy = dist.Strategy() >>> strategy.gradient_merge.enable = True >>> strategy.gradient_merge.k_steps = 2 >>> strategy.gradient_merge.avg = True
- property fused_passes
-
fused_passes
is used to configure the fusion of the computation in the model, containing following configs:enable
(bool): whether to enable fused passes. Default: False.gemm_epilogue
(bool): whether to fusematmul
andadd
computation in theLinear
layer. Default: False“dropout_add” (bool): whether to fuse
dropout
andadd
computation. Default: False.Examples
>>> import paddle >>> import paddle.distributed as dist >>> strategy = dist.Strategy() >>> strategy.fused_passes.enable = True >>> strategy.fused_passes.gemm_spilogue = True >>> strategy.fused_passes.dropout_add = True
- property pipeline
-
pipeline
is used to configure the pipeline parallelism in training, containing following configs:enable
(bool): whether to enable pipeline parallelism. Default: False.schedule_mode
(str): the scheduling mode of pipeline parallelism. Default: “1F1B”.micro_batch_size
(int): the size of each micro-batch inside a mini-batch. Default: 1.accumulate_steps
(int): number of steps for accumulating. Default: 1.Examples
>>> import paddle >>> import paddle.distributed as dist >>> strategy = dist.Strategy() >>> strategy.pipeline.enable = True >>> strategy.pipeline.micro_batch_size = 2