alltoall¶
- paddle.distributed. alltoall ( in_tensor_list, out_tensor_list, group=None, sync_op=True ) [source]
-
Scatter tensors in in_tensor_list to all participators averagely and gather the result tensors in out_tensor_list. As shown below, the in_tensor_list in GPU0 includes 0_0 and 0_1, and GPU1 includes 1_0 and 1_1. Through alltoall operator, the 0_0 in GPU0 will be sent to GPU0 and 0_1 to GPU1, 1_0 in GPU1 sent to GPU0 and 1_1 to GPU1. Finally the out_tensor_list in GPU0 includes 0_0 and 1_0, and GPU1 includes 0_1 and 1_1.
- Parameters
-
in_tensor_list (List[Tensor]) – List of tensors to scatter one per rank. The data type of each tensor should be float16, float32, float64, int32, int64, int8, uint8, bool or bfloat16.
out_tensor_list (List[Tensor]) – List of tensors to be gathered one per rank. The data type of each tensor should be the same as the input tensors.
group (Group, optional) – The group instance return by new_group or None for global default group. Default: None.
sync_op (bool, optional) – Whether this op is a sync op. The default value is True.
- Returns
-
Return a task object.
Examples
>>> >>> import paddle >>> import paddle.distributed as dist >>> dist.init_parallel_env() >>> out_tensor_list = [] >>> if dist.get_rank() == 0: ... data1 = paddle.to_tensor([[1, 2, 3], [4, 5, 6]]) ... data2 = paddle.to_tensor([[7, 8, 9], [10, 11, 12]]) >>> else: ... data1 = paddle.to_tensor([[13, 14, 15], [16, 17, 18]]) ... data2 = paddle.to_tensor([[19, 20, 21], [22, 23, 24]]) >>> dist.alltoall([data1, data2], out_tensor_list) >>> print(out_tensor_list) >>> # [[[1, 2, 3], [4, 5, 6]], [[13, 14, 15], [16, 17, 18]]] (2 GPUs, out for rank 0) >>> # [[[7, 8, 9], [10, 11, 12]], [[19, 20, 21], [22, 23, 24]]] (2 GPUs, out for rank 1)