softmax_with_cross_entropy¶
- paddle.fluid.layers.loss. softmax_with_cross_entropy ( logits, label, soft_label=False, ignore_index=- 100, numeric_stable_mode=True, return_softmax=False, axis=- 1 ) [source]
-
This operator implements the cross entropy loss function with softmax. This function combines the calculation of the softmax operation and the cross entropy loss function to provide a more numerically stable gradient.
Because this operator performs a softmax on logits internally, it expects unscaled logits. This operator should not be used with the output of softmax operator since that would produce incorrect results.
When the attribute
soft_label
is setFalse
, this operators expects mutually exclusive hard labels, each sample in a batch is in exactly one class with a probability of 1.0. Each sample in the batch will have a single label.The equation is as follows:
Hard label (one-hot label, so every sample has exactly one class)
\[\begin{split}loss_j = -\\text{logits}_{label_j} + \\log\\left(\\sum_{i=0}^{K}\\exp(\\text{logits}_i)\\right), j = 1,..., K\end{split}\]Soft label (each sample can have a distribution over all classes)
\[\begin{split}loss_j = -\\sum_{i=0}^{K}\\text{label}_i \\left(\\text{logits}_i - \\log\\left(\\sum_{i=0}^{K} \\exp(\\text{logits}_i)\\right)\\right), j = 1,...,K\end{split}\]If
numeric_stable_mode
isTrue
, softmax is calculated first by:
\[ \begin{align}\begin{aligned}\begin{split}max_j &= \\max_{i=0}^{K}{\\text{logits}_i}\end{split}\\\begin{split}log\\_max\\_sum_j &= \\log\\sum_{i=0}^{K}\\exp(logits_i - max_j)\end{split}\\\begin{split}softmax_j &= \\exp(logits_j - max_j - {log\\_max\\_sum}_j)\end{split}\end{aligned}\end{align} \]and then cross entropy loss is calculated by softmax and label.
- Parameters
-
logits (Tensor) – A multi-dimension
Tensor
, and the data type is float32 or float64. The input tensor of unscaled log probabilities.label (Tensor) – The ground truth
Tensor
, data type is the same as thelogits
. Ifsoft_label
is set toTrue
, Label is aTensor
in the same shape withlogits
. Ifsoft_label
is set toTrue
, Label is aTensor
in the same shape withlogits
expect shape in dimensionaxis
as 1.soft_label (bool, optional) – A flag to indicate whether to interpretant the given labels as soft labels. Default False.
ignore_index (int, optional) – Specifies a target value that is ignored and does not contribute to the input gradient. Only valid if
soft_label
is set toFalse
. Default: kIgnoreIndex(-100).numeric_stable_mode (bool, optional) – A flag to indicate whether to use a more numerically stable algorithm. Only valid when
soft_label
isFalse
and GPU is used. Whensoft_label
isTrue
or CPU is used, the algorithm is always numerically stable. Note that the speed may be slower when use stable algorithm. Default: True.return_softmax (bool, optional) – A flag indicating whether to return the softmax along with the cross entropy loss. Default: False.
axis (int, optional) – The index of dimension to perform softmax calculations. It should be in range \([-1, rank - 1]\), while \(rank\) is the rank of input
logits
. Default: -1.
- Returns
-
- Return the cross entropy loss if
-
return_softmax is False, otherwise the tuple (loss, softmax), softmax is in the same shape with input logits and cross entropy loss is in the same shape with input logits except shape in dimension
axis
as 1.
- Return type
-
Tensor
or Tuple of twoTensor
Examples
import paddle import numpy as np data = np.random.rand(128).astype("float32") label = np.random.rand(1).astype("int64") data = paddle.to_tensor(data) label = paddle.to_tensor(label) linear = paddle.nn.Linear(128, 100) x = linear(data) out = paddle.nn.functional.softmax_with_cross_entropy(logits=x, label=label) print(out)