shard_index

paddle. shard_index ( input: Tensor, index_num: int, nshards: int, shard_id: int, ignore_value: int = -1 ) → Tensor [source]

Reset the values of input according to the shard it belongs to. Every value in input must be a non-negative integer, and the parameter index_num represents the integer above the maximum value of input. Thus, all values in input must be in the range [0, index_num) and each value can be regarded as the offset to the beginning of the range. The range is further split into multiple shards. Specifically, we first compute the shard_size according to the following formula, which represents the number of integers each shard can hold. So for the i’th shard, it can hold values in the range [i*shard_size, (i+1)*shard_size).

           shard_size = (index_num + nshards - 1) // nshards

          

For each value v in input, we reset it to a new value according to the following formula:

           v = v - shard_id * shard_size if shard_id * shard_size <= v < (shard_id + 1) * shard_size else ignore_value

          

That is, the value v is set to the new offset within the range represented by the shard shard_id if it in the range. Otherwise, we reset it to be ignore_value.

As shown below, a [2, 1] 2D tensor is updated with the shard_index operation. Given index_num = 20, nshards = 2, and shard_id = 0, the shard size is shard_size = (20 + 2 - 1) // 2 = 10. For each label element: if its value is in [0, 10), it’s adjusted to its offset; e.g., 1 becomes 1 - 0 * 10 = 1. Otherwise, it’s set to the default ignore_value of -1, like 16 becoming -1.

Parameters

input (Tensor) – Input tensor with data type int64 or int32. It’s last dimension must be 1.
index_num (int) – An integer represents the integer above the maximum value of input.
nshards (int) – The number of shards.
shard_id (int) – The index of the current shard.
ignore_value (int, optional) – An integer value out of sharded index range. The default value is -1.

Returns

Tensor.

Examples

           >>> import paddle
>>> label = paddle.to_tensor([[16], [1]], "int64")
>>> shard_label = paddle.shard_index(input=label,
...                                  index_num=20,
...                                  nshards=2,
...                                  shard_id=0)
>>> print(shard_label.numpy())
[[-1]
 [ 1]]