alltoall_single¶
- paddle.distributed. alltoall_single ( in_tensor, out_tensor, in_split_sizes=None, out_split_sizes=None, group=None, sync_op=True ) [source]
-
Scatter a single input tensor to all participators and gather the received tensors in out_tensor.
Note
alltoall_single
is only supported in eager mode.- Parameters
-
in_tensor (Tensor) – Input tensor. The data type should be float16, float32, float64, int32, int64, int8, uint8, bool or bfloat16.
out_tensor (Tensor) – Output Tensor. The data type should be the same as the data type of the input Tensor.
in_split_sizes (list[int], optional) – Split sizes of
in_tensor
for dim[0]. If not given, dim[0] ofin_tensor
must be divisible by group size andin_tensor
will be scattered averagely to all participators. Default: None.out_split_sizes (list[int], optional) – Split sizes of
out_tensor
for dim[0]. If not given, dim[0] ofout_tensor
must be divisible by group size andout_tensor
will be gathered averagely from all participators. Default: None.group (Group, optional) – The group instance return by
new_group
or None for global default group. Default: None.sync_op (bool, optional) – Whether this op is a sync op. The default value is True.
- Returns
-
Return a task object.
Examples
# required: distributed import paddle import paddle.distributed as dist dist.init_parallel_env() rank = dist.get_rank() size = dist.get_world_size() # case 1 (2 GPUs) data = paddle.arange(2, dtype='int64') + rank * 2 # data for rank 0: [0, 1] # data for rank 1: [2, 3] output = paddle.empty([2], dtype='int64') dist.alltoall_single(data, output) print(output) # output for rank 0: [0, 2] # output for rank 1: [1, 3] # case 2 (2 GPUs) in_split_sizes = [i + 1 for i in range(size)] # in_split_sizes for rank 0: [1, 2] # in_split_sizes for rank 1: [1, 2] out_split_sizes = [rank + 1 for i in range(size)] # out_split_sizes for rank 0: [1, 1] # out_split_sizes for rank 1: [2, 2] data = paddle.ones([sum(in_split_sizes), size], dtype='float32') * rank # data for rank 0: [[0., 0.], [0., 0.], [0., 0.]] # data for rank 1: [[1., 1.], [1., 1.], [1., 1.]] output = paddle.empty([(rank + 1) * size, size], dtype='float32') group = dist.new_group([0, 1]) task = dist.alltoall_single(data, output, in_split_sizes, out_split_sizes, sync_op=False, group=group) task.wait() print(output) # output for rank 0: [[0., 0.], [1., 1.]] # output for rank 1: [[0., 0.], [0., 0.], [1., 1.], [1., 1.]]