sequence_pad

paddle.static.nn. sequence_pad ( x, pad_value, maxlen=None, name=None ) [source]

This layer padding the sequences in a same batch to a common length (according to maxlen). The padding value is defined by pad_value, and will be appended to the tail of sequences. The result is a Python tuple (Out, Length): the Tensor Out is the padded sequences, and Tensor Length is the length information of input sequences. For removing padding data (unpadding operation), See sequence_unpad.

Note

Please note that the input x should be Tensor.

Case 1:
Given input 1-level Tensor x:
    x.lod = [[0,  2,   5]]
    x.data = [[a],[b],[c],[d],[e]]
pad_value:
    pad_value.data = [0]
maxlen = 4

the output tuple (Out, Length):
    Out.data = [[[a],[b],[0],[0]],[[c],[d],[e],[0]]]
    Length.data = [2, 3]      #Original sequences length

Case 2:
Given input 1-level Tensor x:
    x.lod =  [[0,             2,                     5]]
    x.data = [[a1,a2],[b1,b2],[c1,c2],[d1,d2],[e1,e2]]
pad_value:
    pad_value.data = [0]
default maxlen = None, (the virtual value is 3, according to the shape of x)

the output tuple (Out, Length):
    Out.data = [[[a1,a2],[b1,b2],[0,0]],[[c1,c2],[d1,d2],[e1,e2]]]
    Length.data = [2, 3]

Case 3:
Given input 1-level Tensor x:
    x.lod =  [[0,             2,                     5]]
    x.data = [[a1,a2],[b1,b2],[c1,c2],[d1,d2],[e1,e2]]
pad_value:
    pad_value.data = [p1,p2]
default maxlen = None, (the virtual value is 3)

get tuple (Out, Length):
    Out.data = [[[a1,a2],[b1,b2],[p1,p2]],[[c1,c2],[d1,d2],[e1,e2]]]
    Length.data = [2, 3]
Parameters
  • x (Tensor) – Input 1-level Tensor with dims [M, K]. The batch size is described by lod info (the number of sequences ). The data type should be float32, float64, int8, int32 or int64.

  • pad_value (Tensor) – Padding value. It can be a scalar or a 1D tensor with length K. If it’s a scalar, it will be automatically broadcasted to a Tensor. The data type should be as same as x.

  • maxlen (int, optional) – The length of padded sequences, None by default. When it is None, all sequences will be padded up to the length of the longest one among them; when it a certain positive value, it must be greater than the length of the longest original sequence.

  • name (str, optional) – For detailed information, please refer to Name. Usually name is no need to set and None by default.

Returns

the 1st is a 0 level Tensor Out, with the shape [batch_size, maxlen, K]; the second is the original sequences length info Length, which should be a 0-level 1D Tensor. The size of Length is equal to batch size, and the data type is int64.

Return type

tuple, A Python tuple (Out, Length)

Examples

>>> import paddle
>>> paddle.enable_static()
>>> import paddle.base as base
>>> import numpy

>>> x = paddle.static.data(name='x', shape=[10, 5], dtype='float32', lod_level=1)
>>> pad_value = paddle.assign(
...     numpy.array([0.0], dtype=numpy.float32))
>>> out = paddle.static.nn.sequence_pad(x=x, pad_value=pad_value)