reduce_scatter

paddle.distributed. reduce_scatter ( tensor: Tensor, tensor_list: list[Tensor], op: _ReduceOp = 0, group: Group | None = None, sync_op: bool = True ) → task [source]

Reduces, then scatters a list of tensors to all processes in a group

Parameters

tensor (Tensor) – The output tensor on each rank. The result will overwrite this tenor after communication. Support float16, float32, float64, int32, int64, int8, uint8 or bool as the input data type.
tensor_list (List[Tensor]]) – List of tensors to reduce and scatter. Every element in the list must be a Tensor whose data type should be float16, float32, float64, int32, int64, int8, uint8, bool or bfloat16.
op (ReduceOp.SUM|ReduceOp.MAX|ReduceOp.MIN|ReduceOp.PROD|ReduceOp.AVG, optional) – The reduction used. If none is given, use ReduceOp.SUM as default.
group (Group, optional) – Communicate in which group. If none is given, use the global group as default.
sync_op (bool, optional) – Indicate whether the communication is sync or not. If none is given, use true as default.

Returns

Return a task object.

Warning

This API only supports the dygraph mode.

Examples

           >>> 
>>> import paddle
>>> import paddle.distributed as dist

>>> dist.init_parallel_env()
>>> if dist.get_rank() == 0:
...     data1 = paddle.to_tensor([0, 1])
...     data2 = paddle.to_tensor([2, 3])
>>> else:
...     data1 = paddle.to_tensor([4, 5])
...     data2 = paddle.to_tensor([6, 7])
>>> dist.reduce_scatter(data1, [data1, data2])
>>> print(data1)
>>> # [4, 6] (2 GPUs, out for rank 0)
>>> # [8, 10] (2 GPUs, out for rank 1)