warpctc¶
- paddle.fluid.layers.loss. warpctc ( input, label, blank=0, norm_by_times=False, input_length=None, label_length=None ) [source]
-
An operator integrating the open source Warp-CTC library (https://github.com/baidu-research/warp-ctc) to compute Connectionist Temporal Classification (CTC) loss. It can be aliased as softmax with CTC, since a native softmax activation is interated to the Warp-CTC library to normalize values for each row of the input tensor.
- Parameters
-
input (Variable) – The unscaled probabilities of variable-length sequences, which is a 2-D Tensor with LoD information, or a 3-D Tensor without Lod information. When it is a 2-D LodTensor, its shape is [Lp, num_classes + 1], where Lp is the sum of all input sequences’ length and num_classes is the true number of classes. (not including the blank label). When it is a 3-D Tensor, its shape is [max_logit_length, batch_size, num_classes + 1], where max_logit_length is the longest length of input logit sequence. The data type should be float32 or float64.
label (Variable) – The ground truth of variable-length sequence, which must be a 2-D Tensor with LoD information or a 3-D Tensor without LoD information, needs to be consistent with the coressponding input. When it is a 2-D LoDTensor, its shape is [Lg, 1], where Lg is the sum of all labels’ length. When it is a 3-D Tensor, its shape is [batch_size, max_label_length], where max_label_length is the longest length of label sequence. Data type must be int32.
blank (int, default 0) – The blank label index of Connectionist Temporal Classification (CTC) loss, which is in the half-opened interval [0, num_classes + 1). The data type must be int32.
norm_by_times (bool, default false) – Whether to normalize the gradients by the number of time-step, which is also the sequence’s length. There is no need to normalize the gradients if warpctc layer was followed by a mean_op.
input_length (Variable) – The length for each input sequence if it is of Tensor type, it should have shape [batch_size] and dtype int64.
label_length (Variable) – The length for each label sequence if it is of Tensor type, it should have shape [batch_size] and dtype int64.
- Returns
-
The Connectionist Temporal Classification (CTC) loss, which is a 2-D Tensor with the shape [batch_size, 1]. The date type is the same as input.
- Return type
-
Variable
Examples
# using LoDTensor import paddle import paddle.fluid as fluid import numpy as np # lengths of logit sequences seq_lens = [2,6] # lengths of label sequences label_lens = [2,3] # class num class_num = 5 paddle.enable_static() logits = fluid.data(name='logits',shape=[None, class_num+1], dtype='float32',lod_level=1) label = fluid.data(name='label', shape=[None, 1], dtype='int32', lod_level=1) cost = fluid.layers.warpctc(input=logits, label=label) place = fluid.CPUPlace() x = fluid.create_lod_tensor( np.random.rand(np.sum(seq_lens), class_num+1).astype("float32"), [seq_lens], place) y = fluid.create_lod_tensor( np.random.randint(0, class_num, [np.sum(label_lens), 1]).astype("int32"), [label_lens], place) exe = fluid.Executor(place) output= exe.run(fluid.default_main_program(), feed={"logits": x,"label": y}, fetch_list=[cost.name]) print(output)
# using Tensor import paddle import paddle.fluid as fluid import numpy as np # length of the longest logit sequence max_seq_length = 5 #length of the longest label sequence max_label_length = 3 # number of logit sequences batch_size = 16 # class num class_num = 5 paddle.enable_static() logits = fluid.data(name='logits', shape=[max_seq_length, batch_size, class_num+1], dtype='float32') logits_length = fluid.data(name='logits_length', shape=[None], dtype='int64') label = fluid.data(name='label', shape=[batch_size, max_label_length], dtype='int32') label_length = fluid.data(name='labels_length', shape=[None], dtype='int64') cost = fluid.layers.warpctc(input=logits, label=label, input_length=logits_length, label_length=label_length) place = fluid.CPUPlace() x = np.random.rand(max_seq_length, batch_size, class_num+1).astype("float32") y = np.random.randint(0, class_num, [batch_size, max_label_length]).astype("int32") exe = fluid.Executor(place) output= exe.run(fluid.default_main_program(), feed={"logits": x, "label": y, "logits_length": np.array([max_seq_length]*batch_size).astype("int64"), "labels_length": np.array([max_label_length]*batch_size).astype("int64")}, fetch_list=[cost.name]) print(output)