shard_scaler¶
将单卡视角的 GradScaler 转变为分布式视角。
参数¶
scaler (paddle.amp.GradScaler) - 单卡视角下的 GradScaler。
返回¶
GradScaler:一个具有分布式视角的 GradScaler 对象。
代码示例¶
>>> import paddle
>>> import paddle.distributed as dist
>>> mesh = dist.ProcessMesh([0, 1], dim_names=["x"])
>>> class MLP(paddle.nn.Layer):
... def __init__(self):
... super().__init__()
... self.fc1 = paddle.nn.Linear(8, 8)
... self.fc2 = paddle.nn.Linear(8, 8)
...
... def forward(self, input):
... return self.fc2(self.fc1(input))
>>> layer = MLP()
>>> batch = paddle.rand(shape=[8, 8])
>>> opt = paddle.optimizer.AdamW(parameters=layer.parameters())
>>> layer, opt = paddle.amp.decorate(layer, opt, level='O2')
>>> scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
>>> scaler = dist.shard_scaler(scaler)
>>> opt = dist.shard_optimizer(opt)
>>> for _ in range(5):
>>> with paddle.amp.auto_cast(True):
>>> loss = layer(batch)
>>> scaled = scaler.scale(loss)
>>> scaled.backward()
>>> scaler.step(opt)
>>> scaler.update()
>>> opt.clear_grad()
>>> # This case need to be executed in multi-card environment
>>> # python -m paddle.distributed.launch --gpus=0,1 {test_case}.py