ctc_loss¶
- paddle.nn.functional. ctc_loss ( log_probs, labels, input_lengths, label_lengths, blank=0, reduction='mean', norm_by_times=False ) [source]
-
An operator integrating the open source Warp-CTC library (https://github.com/baidu-research/warp-ctc) to compute Connectionist Temporal Classification (CTC) loss. It can be aliased as softmax with CTC, since a native softmax activation is interated to the Warp-CTC library to normalize values for each row of the input tensor.
- Parameters
-
log_probs (Tensor) – The unscaled probability sequence with padding, which is a 3-D Tensor. The tensor shape is [max_logit_length, batch_size, num_classes + 1], where max_logit_length is the longest length of input logit sequence. The data type should be float32 or float64.
labels (Tensor) – The ground truth sequence with padding, which must be a 3-D Tensor. The tensor shape is [batch_size, max_label_length], where max_label_length is the longest length of label sequence. The data type must be int32.
input_lengths (Tensor) – The length for each input sequence, it should have shape [batch_size] and dtype int64.
label_lengths (Tensor) – The length for each label sequence, it should have shape [batch_size] and dtype int64.
blank (int, optional) – The blank label index of Connectionist Temporal Classification (CTC) loss, which is in the half-opened interval [0, num_classes + 1). The data type must be int32. Default: 0.
reduction (str, optional) – Indicate how to average the loss, the candidates are
'none'
|'mean'
|'sum'
. Ifreduction
is'mean'
, the output loss will be divided by the label_lengths, and then return the mean of quotient; Ifreduction
is'sum'
, return the sum of loss; Ifreduction
is'none'
, no reduction will be applied. Default:'mean'
.norm_by_times (bool, optional) – Whether to normalize the gradients by the number of time-step, which is also the sequence’s length. There is no need to normalize the gradients if reduction mode is ‘mean’. Default: False.
- Returns
-
reduction is
'none'
, the shape of loss is [batch_size], otherwise, the shape of loss is []. Data type is the same aslog_probs
. - Return type
-
Tensor, The Connectionist Temporal Classification (CTC) loss between
log_probs
andlabels
. If attr
Examples
>>> # declarative mode >>> import paddle.nn.functional as F >>> import paddle >>> import numpy as np >>> # length of the longest logit sequence >>> max_seq_length = 4 >>> #length of the longest label sequence >>> max_label_length = 3 >>> # number of logit sequences >>> batch_size = 2 >>> # class num >>> class_num = 3 >>> log_probs = paddle.to_tensor(np.array([ ... [[4.17021990e-01, 7.20324516e-01, 1.14374816e-04], ... [3.02332580e-01, 1.46755889e-01, 9.23385918e-02]], ... [[1.86260208e-01, 3.45560730e-01, 3.96767467e-01], ... [5.38816750e-01, 4.19194520e-01, 6.85219526e-01]], ... [[2.04452246e-01, 8.78117442e-01, 2.73875929e-02], ... [6.70467496e-01, 4.17304814e-01, 5.58689833e-01]], ... [[1.40386939e-01, 1.98101491e-01, 8.00744593e-01], ... [9.68261600e-01, 3.13424170e-01, 6.92322612e-01]], ... [[8.76389146e-01, 8.94606650e-01, 8.50442126e-02], ... [3.90547849e-02, 1.69830427e-01, 8.78142476e-01]] ... ]), dtype="float32") >>> labels = paddle.to_tensor([[1, 2, 2], ... [1, 2, 2]], dtype="int32") >>> input_lengths = paddle.to_tensor([5, 5], dtype="int64") >>> label_lengths = paddle.to_tensor([3, 3], dtype="int64") >>> loss = F.ctc_loss(log_probs, labels, ... input_lengths, ... label_lengths, ... blank=0, ... reduction='none') >>> print(loss) Tensor(shape=[2], dtype=float32, place=Place(cpu), stop_gradient=True, [3.91798496, 2.90765190]) >>> loss = F.ctc_loss(log_probs, labels, ... input_lengths, ... label_lengths, ... blank=0, ... reduction='mean') >>> print(loss) Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=True, 1.13760614)