alltoall_single¶
- paddle.distributed. alltoall_single ( in_tensor, out_tensor, in_split_sizes=None, out_split_sizes=None, group=None, sync_op=True ) [source]
-
Scatter a single input tensor to all participators and gather the received tensors in out_tensor.
Note
alltoall_single
is only supported in eager mode.- Parameters
-
in_tensor (Tensor) – Input tensor. The data type should be float16, float32, float64, int32, int64, int8, uint8, bool or bfloat16.
out_tensor (Tensor) – Output Tensor. The data type should be the same as the data type of the input Tensor.
in_split_sizes (list[int], optional) – Split sizes of
in_tensor
for dim[0]. If not given, dim[0] ofin_tensor
must be divisible by group size andin_tensor
will be scattered averagely to all participators. Default: None.out_split_sizes (list[int], optional) – Split sizes of
out_tensor
for dim[0]. If not given, dim[0] ofout_tensor
must be divisible by group size andout_tensor
will be gathered averagely from all participators. Default: None.group (Group, optional) – The group instance return by
new_group
or None for global default group. Default: None.sync_op (bool, optional) – Whether this op is a sync op. The default value is True.
- Returns
-
Return a task object.
Examples
>>> >>> import paddle >>> import paddle.distributed as dist >>> dist.init_parallel_env() >>> rank = dist.get_rank() >>> size = dist.get_world_size() >>> # case 1 (2 GPUs) >>> data = paddle.arange(2, dtype='int64') + rank * 2 >>> # data for rank 0: [0, 1] >>> # data for rank 1: [2, 3] >>> output = paddle.empty([2], dtype='int64') >>> dist.alltoall_single(data, output) >>> print(output) >>> # output for rank 0: [0, 2] >>> # output for rank 1: [1, 3] >>> # case 2 (2 GPUs) >>> in_split_sizes = [i + 1 for i in range(size)] >>> # in_split_sizes for rank 0: [1, 2] >>> # in_split_sizes for rank 1: [1, 2] >>> out_split_sizes = [rank + 1 for i in range(size)] >>> # out_split_sizes for rank 0: [1, 1] >>> # out_split_sizes for rank 1: [2, 2] >>> data = paddle.ones([sum(in_split_sizes), size], dtype='float32') * rank >>> # data for rank 0: [[0., 0.], [0., 0.], [0., 0.]] >>> # data for rank 1: [[1., 1.], [1., 1.], [1., 1.]] >>> output = paddle.empty([(rank + 1) * size, size], dtype='float32') >>> group = dist.new_group([0, 1]) >>> task = dist.alltoall_single(data, ... output, ... in_split_sizes, ... out_split_sizes, ... sync_op=False, ... group=group) >>> task.wait() >>> print(output) >>> # output for rank 0: [[0., 0.], [1., 1.]] >>> # output for rank 1: [[0., 0.], [0., 0.], [1., 1.], [1., 1.]]