warpctc¶

paddle.fluid.layers.loss. warpctc ( input, label, blank=0, norm_by_times=False, input_length=None, label_length=None ) [source]

An operator integrating the open source Warp-CTC library (https://github.com/baidu-research/warp-ctc) to compute Connectionist Temporal Classification (CTC) loss. It can be aliased as softmax with CTC, since a native softmax activation is interated to the Warp-CTC library to normalize values for each row of the input tensor.

Parameters

input (Variable) – The unscaled probabilities of variable-length sequences, which is a 2-D Tensor with LoD information, or a 3-D Tensor without Lod information. When it is a 2-D LodTensor, its shape is [Lp, num_classes + 1], where Lp is the sum of all input sequences’ length and num_classes is the true number of classes. (not including the blank label). When it is a 3-D Tensor, its shape is [max_logit_length, batch_size, num_classes + 1], where max_logit_length is the longest length of input logit sequence. The data type should be float32 or float64.
label (Variable) – The ground truth of variable-length sequence, which must be a 2-D Tensor with LoD information or a 3-D Tensor without LoD information, needs to be consistent with the coressponding input. When it is a 2-D LoDTensor, its shape is [Lg, 1], where Lg is the sum of all labels’ length. When it is a 3-D Tensor, its shape is [batch_size, max_label_length], where max_label_length is the longest length of label sequence. Data type must be int32.
blank (int, default 0) – The blank label index of Connectionist Temporal Classification (CTC) loss, which is in the half-opened interval [0, num_classes + 1). The data type must be int32.
norm_by_times (bool, default false) – Whether to normalize the gradients by the number of time-step, which is also the sequence’s length. There is no need to normalize the gradients if warpctc layer was followed by a mean_op.
input_length (Variable) – The length for each input sequence if it is of Tensor type, it should have shape [batch_size] and dtype int64.
label_length (Variable) – The length for each label sequence if it is of Tensor type, it should have shape [batch_size] and dtype int64.

Returns

The Connectionist Temporal Classification (CTC) loss, which is a 2-D Tensor with the shape [batch_size, 1]. The date type is the same as input.

Return type

Variable

Examples

# using LoDTensor
import paddle
import paddle.fluid as fluid
import numpy as np

# lengths of logit sequences
seq_lens = [2,6]
# lengths of label sequences
label_lens = [2,3]
# class num
class_num = 5

paddle.enable_static()
logits = fluid.data(name='logits',shape=[None, class_num+1],
                     dtype='float32',lod_level=1)
label = fluid.data(name='label', shape=[None, 1],
                   dtype='int32', lod_level=1)
cost = fluid.layers.warpctc(input=logits, label=label)
place = fluid.CPUPlace()
x = fluid.create_lod_tensor(
         np.random.rand(np.sum(seq_lens), class_num+1).astype("float32"),
         [seq_lens], place)
y = fluid.create_lod_tensor(
         np.random.randint(0, class_num, [np.sum(label_lens), 1]).astype("int32"),
         [label_lens], place)
exe = fluid.Executor(place)
output= exe.run(fluid.default_main_program(),
                feed={"logits": x,"label": y},
                fetch_list=[cost.name])
print(output)

# using Tensor
import paddle
import paddle.fluid as fluid
import numpy as np

# length of the longest logit sequence
max_seq_length = 5
#length of the longest label sequence
max_label_length = 3
# number of logit sequences
batch_size = 16
# class num
class_num = 5
paddle.enable_static()
logits = fluid.data(name='logits',
               shape=[max_seq_length, batch_size, class_num+1],
               dtype='float32')
logits_length = fluid.data(name='logits_length', shape=[None],
                 dtype='int64')
label = fluid.data(name='label', shape=[batch_size, max_label_length],
               dtype='int32')
label_length = fluid.data(name='labels_length', shape=[None],
                 dtype='int64')
cost = fluid.layers.warpctc(input=logits, label=label,
                input_length=logits_length,
                label_length=label_length)
place = fluid.CPUPlace()
x = np.random.rand(max_seq_length, batch_size, class_num+1).astype("float32")
y = np.random.randint(0, class_num, [batch_size, max_label_length]).astype("int32")
exe = fluid.Executor(place)
output= exe.run(fluid.default_main_program(),
                feed={"logits": x,
                      "label": y,
                      "logits_length": np.array([max_seq_length]*batch_size).astype("int64"),
                      "labels_length": np.array([max_label_length]*batch_size).astype("int64")},
                      fetch_list=[cost.name])
print(output)