ctc_greedy_decoder¶
- paddle.fluid.layers.nn. ctc_greedy_decoder ( input, blank, input_length=None, padding_value=0, name=None ) [source]
-
This op is used to decode sequences by greedy policy by the following steps:
Get the indexes of maximum value for each row in input. a.k.a. numpy.argmax(input, axis=0).
For each sequence in result of step1, merge repeated tokens between two blanks and delete all blanks.
This op is implemented in two modes: lod and padding, either of them can be used. The input can be either LoDTensor or Tensor, corresponding to lod and padding mode respectively.
A simple example as below:
Given: (1) for lod mode: input.data = [[0.6, 0.1, 0.3, 0.1], [0.3, 0.2, 0.4, 0.1], [0.1, 0.5, 0.1, 0.3], [0.5, 0.1, 0.3, 0.1], [0.5, 0.1, 0.3, 0.1], [0.2, 0.2, 0.2, 0.4], [0.2, 0.2, 0.1, 0.5], [0.5, 0.1, 0.3, 0.1]] input.lod = [[4, 4]] Computation: step1: Apply argmax to first input sequence which is input.data[0:4]. Then we get: [[0], [2], [1], [0]] step2: merge repeated tokens and remove blank which is 0. Then we get first output sequence: [[2], [1]] Finally: output.data = [[2], [1], [3]] output.lod = [[2, 1]] (2) for padding mode: input.data = [[[0.6, 0.1, 0.3, 0.1], [0.3, 0.2, 0.4, 0.1], [0.1, 0.5, 0.1, 0.3], [0.5, 0.1, 0.3, 0.1]], [[0.5, 0.1, 0.3, 0.1], [0.2, 0.2, 0.2, 0.4], [0.2, 0.2, 0.1, 0.5], [0.5, 0.1, 0.3, 0.1]]] input_length.data = [[4], [4]] input.shape = [2, 4, 4] step1: Apply argmax to first input sequence which is input.data[0:4]. Then we get: [[0], [2], [1], [0]], for input.data[4:8] is [[0], [3], [3], [0]], shape is [2,4,1] step2: Change the argmax result to use padding mode, then argmax result is [[0, 2, 1, 0], [0, 3, 3, 0]], shape is [2, 4], lod is [], input_length is [[4], [4]] step3: Apply ctc_align to padding argmax result, padding_value is 0 Finally: output.data = [[2, 1, 0, 0], [3, 0, 0, 0]] output_length.data = [[2], [1]]
- Parameters
-
input (Variable) – the probabilities of variable-length sequences. When in lod mode, it is a 2-D LoDTensor with LoD information. It’s shape is [Lp, num_classes + 1] where Lp is the sum of all input sequences’ length and num_classes is the true number of classes. When in padding mode, it is a 3-D Tensor with padding, It’s shape is [batch_size, N, num_classes + 1]. (not including the blank label). The data type can be float32 or float64.
blank (int) – the blank label index of Connectionist Temporal Classification (CTC) loss, which is in the half-opened interval [0, num_classes + 1).
input_length (Variable, optional) – 2-D LoDTensor, shape is [batch_size, 1], data type is int64. It is used for padding mode. In lod mode, input_length is None.
padding_value (int) – padding value.
name (str, optional) – The default value is None. Normally there is no need for user to set this property. For more information, please refer to Name
- Returns
-
For lod mode, returns the result of CTC greedy decoder, 2-D LoDTensor, shape is [Lp, 1], data type is int64. ‘Lp’ is the sum of all output sequences’ length. If all the sequences in result were empty, the result LoDTensor will be [-1] with empty LoD [[]].
For padding mode, returns a tuple of (output, output_length), which was described as below:
output, 2-D Tensor, shape is [batch_size, N], data type is int64.
- output_length, 2-D Tensor, shape is [batch_size, 1], data type is int64. It is the length of
-
each sequence of output for padding mode.
- Return type:
-
For lod mode: Variable
For padding mode: tuple of two Variables (output, output_length).
Examples
# for lod mode import paddle.fluid as fluid x = fluid.data(name='x', shape=[None, 8], dtype='float32', lod_level=1) cost = fluid.layers.ctc_greedy_decoder(input=x, blank=0) # for padding mode x_pad = fluid.data(name='x_pad', shape=[10, 4, 8], dtype='float32') x_pad_len = fluid.data(name='x_pad_len', shape=[10, 1], dtype='int64') out, out_len = fluid.layers.ctc_greedy_decoder(input=x_pad, blank=0, input_length=x_pad_len)