ctc_greedy_decoder¶

paddle.fluid.layers.nn. ctc_greedy_decoder ( input, blank, input_length=None, padding_value=0, name=None ) [source]

This op is used to decode sequences by greedy policy by the following steps:

Get the indexes of maximum value for each row in input. a.k.a. numpy.argmax(input, axis=0).
For each sequence in result of step1, merge repeated tokens between two blanks and delete all blanks.

This op is implemented in two modes: lod and padding, either of them can be used. The input can be either LoDTensor or Tensor, corresponding to lod and padding mode respectively.

A simple example as below:

Given:
(1) for lod mode:

input.data = [[0.6, 0.1, 0.3, 0.1],
              [0.3, 0.2, 0.4, 0.1],
              [0.1, 0.5, 0.1, 0.3],
              [0.5, 0.1, 0.3, 0.1],

              [0.5, 0.1, 0.3, 0.1],
              [0.2, 0.2, 0.2, 0.4],
              [0.2, 0.2, 0.1, 0.5],
              [0.5, 0.1, 0.3, 0.1]]

input.lod = [[4, 4]]

Computation:

step1: Apply argmax to first input sequence which is input.data[0:4]. Then we get:
       [[0], [2], [1], [0]]
step2: merge repeated tokens and remove blank which is 0. Then we get first output sequence:
       [[2], [1]]

Finally:

output.data = [[2],
               [1],
               [3]]

output.lod = [[2, 1]]

(2) for padding mode:

 input.data = [[[0.6, 0.1, 0.3, 0.1],
                [0.3, 0.2, 0.4, 0.1],
                [0.1, 0.5, 0.1, 0.3],
                [0.5, 0.1, 0.3, 0.1]],

               [[0.5, 0.1, 0.3, 0.1],
                [0.2, 0.2, 0.2, 0.4],
                [0.2, 0.2, 0.1, 0.5],
                [0.5, 0.1, 0.3, 0.1]]]

input_length.data = [[4], [4]]
input.shape = [2, 4, 4]

step1: Apply argmax to first input sequence which is input.data[0:4]. Then we get:
       [[0], [2], [1], [0]], for input.data[4:8] is [[0], [3], [3], [0]], shape is [2,4,1]
step2: Change the argmax result to use padding mode, then argmax result is
        [[0, 2, 1, 0], [0, 3, 3, 0]], shape is [2, 4], lod is [], input_length is [[4], [4]]
step3: Apply ctc_align to padding argmax result, padding_value is 0

Finally:
output.data = [[2, 1, 0, 0],
               [3, 0, 0, 0]]
output_length.data = [[2], [1]]

Parameters

input (Variable) – the probabilities of variable-length sequences. When in lod mode, it is a 2-D LoDTensor with LoD information. It’s shape is [Lp, num_classes + 1] where Lp is the sum of all input sequences’ length and num_classes is the true number of classes. When in padding mode, it is a 3-D Tensor with padding, It’s shape is [batch_size, N, num_classes + 1]. (not including the blank label). The data type can be float32 or float64.
blank (int) – the blank label index of Connectionist Temporal Classification (CTC) loss, which is in the half-opened interval [0, num_classes + 1).
input_length (Variable, optional) – 2-D LoDTensor, shape is [batch_size, 1], data type is int64. It is used for padding mode. In lod mode, input_length is None.
padding_value (int) – padding value.
name (str, optional) – The default value is None. Normally there is no need for user to set this property. For more information, please refer to Name

Returns

For lod mode, returns the result of CTC greedy decoder, 2-D LoDTensor, shape is [Lp, 1], data type is int64. ‘Lp’ is the sum of all output sequences’ length. If all the sequences in result were empty, the result LoDTensor will be [-1] with empty LoD [[]].

For padding mode, returns a tuple of (output, output_length), which was described as below:

output, 2-D Tensor, shape is [batch_size, N], data type is int64.

output_length, 2-D Tensor, shape is [batch_size, 1], data type is int64. It is the length of: each sequence of output for padding mode.

Return type:

For lod mode: Variable

For padding mode: tuple of two Variables (output, output_length).

Examples

# for lod mode
import paddle.fluid as fluid
x = fluid.data(name='x', shape=[None, 8], dtype='float32', lod_level=1)
cost = fluid.layers.ctc_greedy_decoder(input=x, blank=0)

# for padding mode
x_pad = fluid.data(name='x_pad', shape=[10, 4, 8], dtype='float32')
x_pad_len = fluid.data(name='x_pad_len', shape=[10, 1], dtype='int64')
out, out_len = fluid.layers.ctc_greedy_decoder(input=x_pad, blank=0,
                input_length=x_pad_len)