alltoall_single

paddle.distributed. alltoall_single ( out_tensor: Tensor, in_tensor: Tensor, in_split_sizes: list[int] | None = None, out_split_sizes: list[int] | None = None, group: Group | None = None, sync_op: bool = True ) → task [source]

Scatter a single input tensor to all participators and gather the received tensors in out_tensor.

Note

alltoall_single is only supported in eager mode.

Parameters

out_tensor (Tensor) – Output Tensor. The data type should be the same as the data type of the input Tensor.
in_tensor (Tensor) – Input tensor. The data type should be float16, float32, float64, int32, int64, int8, uint8, bool or bfloat16.
in_split_sizes (list[int]|None, optional) – Split sizes of in_tensor for dim[0]. If not given, dim[0] of in_tensor must be divisible by group size and in_tensor will be scattered averagely to all participators. Default: None.
out_split_sizes (list[int]|None, optional) – Split sizes of out_tensor for dim[0]. If not given, dim[0] of out_tensor must be divisible by group size and out_tensor will be gathered averagely from all participators. Default: None.
group (Group|None, optional) – The group instance return by new_group or None for global default group. Default: None.
sync_op (bool, optional) – Whether this op is a sync op. The default value is True.

Returns

Return a task object.

Examples

>>> 
>>> import paddle
>>> import paddle.distributed as dist

>>> dist.init_parallel_env()
>>> rank = dist.get_rank()
>>> size = dist.get_world_size()

>>> # case 1 (2 GPUs)
>>> data = paddle.arange(2, dtype='int64') + rank * 2
>>> # data for rank 0: [0, 1]
>>> # data for rank 1: [2, 3]
>>> output = paddle.empty([2], dtype='int64')
>>> dist.alltoall_single(output, data)
>>> print(output)
>>> # output for rank 0: [0, 2]
>>> # output for rank 1: [1, 3]

>>> # case 2 (2 GPUs)
>>> in_split_sizes = [i + 1 for i in range(size)]
>>> # in_split_sizes for rank 0: [1, 2]
>>> # in_split_sizes for rank 1: [1, 2]
>>> out_split_sizes = [rank + 1 for i in range(size)]
>>> # out_split_sizes for rank 0: [1, 1]
>>> # out_split_sizes for rank 1: [2, 2]
>>> data = paddle.ones([sum(in_split_sizes), size], dtype='float32') * rank
>>> # data for rank 0: [[0., 0.], [0., 0.], [0., 0.]]
>>> # data for rank 1: [[1., 1.], [1., 1.], [1., 1.]]
>>> output = paddle.empty([(rank + 1) * size, size], dtype='float32')
>>> group = dist.new_group([0, 1])
>>> task = dist.alltoall_single(data,
...                             output,
...                             in_split_sizes,
...                             out_split_sizes,
...                             sync_op=False,
...                             group=group)
>>> task.wait()
>>> print(output)
>>> # output for rank 0: [[0., 0.], [1., 1.]]
>>> # output for rank 1: [[0., 0.], [0., 0.], [1., 1.], [1., 1.]]