CrossEntropyLoss¶
- class paddle.nn. CrossEntropyLoss ( weight=None, ignore_index=- 100, reduction='mean', soft_label=False, axis=- 1, use_softmax=True, name=None ) [source]
-
By default, this operator implements the cross entropy loss function with softmax. This function combines the calculation of the softmax operation and the cross entropy loss function to provide a more numerically stable computing.
This operator will calculate the cross entropy loss function without softmax when use_softmax=False.
By default, this operator will calculate the mean of the result, and you can also affect the default behavior by using the reduction parameter. Please refer to the part of parameters for details.
This operator can be used to calculate the softmax cross entropy loss with soft and hard labels. Where, the hard labels mean the actual label value, 0, 1, 2, etc. And the soft labels mean the probability of the actual label, 0.6, 0.8, 0.2, etc.
The calculation of this operator includes the following two steps.
I.softmax cross entropy
Hard label (each sample can only be assigned into one category)
1.1. when use_softmax=True
where, N is the number of samples and C is the number of categories.
1.2. when use_softmax=False
where, N is the number of samples and C is the number of categories, P is input(the output of softmax).
Soft label (each sample is assigned to multiple categories with a certain probability, and the probability sum is 1).
2.1. when use_softmax=True
where, N is the number of samples and C is the number of categories.
2.2. when use_softmax=False
where, N is the number of samples and C is the number of categories, P is input(the output of softmax).
II.Weight and reduction processing
Weight
If the
weight
parameter isNone
, go to the next step directly.If the
weight
parameter is notNone
, the cross entropy of each sample is weighted by weight according to soft_label = False or True as follows.1.1. Hard labels (soft_label = False)
1.2. Soft labels (soft_label = True)
reduction
2.1 if the
reduction
parameter isnone
Return the previous result directly
2.2 if the
reduction
parameter issum
Return the sum of the previous results
2.3 if the
reduction
parameter ismean
, it will be processed according to theweight
parameter as follows.2.3.1. If the
weight
parameter isNone
Return the average value of the previous results
where, N is the number of samples and C is the number of categories.
2.3.2. If the ‘weight’ parameter is not ‘None’, the weighted average value of the previous result will be returned
Hard labels (soft_label = False)
Soft labels (soft_label = True)
- Parameters
-
weight (-) – a manual rescaling weight given to each class. If given, has to be a Tensor of size C and the data type is float32, float64. Default is
'None'
.ignore_index (-) – Specifies a target value that is ignored and does not contribute to the loss. A negative value means that no label value needs to be ignored. Only valid when soft_label = False. Default is
-100
.reduction (-) – Indicate how to average the loss by batch_size, the candicates are
'none'
|'mean'
|'sum'
. Ifreduction
is'mean'
, the reduced mean loss is returned; Ifsize_average
is'sum'
, the reduced sum loss is returned. Ifreduction
is'none'
, the unreduced loss is returned. Default is'mean'
.soft_label (-) – Indicate whether label is soft. If soft_label=False, the label is hard. If soft_label=True, the label is soft. Default is
False
.axis (-) – The index of dimension to perform softmax calculations. It should be in range , where is the number of dimensions of input
input
. Default is-1
.use_softmax (-) – Indicate whether compute softmax before cross_entropy. Default is
True
.name (-) – The name of the operator. Default is
None
. For more information, please refer to Name .
Shape:
input (Tensor)
Input tensor, the data type is float32, float64. Shape is , where C is number of classes ,
k >= 1
.Note:
1. when use_softmax=True, it expects unscaled logits. This operator should not be used with the output of softmax operator, which will produce incorrect results.
when use_softmax=False, it expects the output of softmax operator.
label (Tensor)
1. If soft_label=False, the shape is or , k >= 1. the data type is int32, int64, float32, float64, where each value is [0, C-1].
2. If soft_label=True, the shape and data type should be same with
input
, and the sum of the labels for each sample should be 1.output (Tensor)
Return the softmax cross_entropy loss of
input
andlabel
.The data type is the same as input.
If
reduction
is'mean'
or'sum'
, the dimension of return value is1
.If
reduction
is'none'
:If soft_label = False, the dimension of return value is the same with
label
.if soft_label = True, the dimension of return value is .
Example1(hard labels):
import paddle paddle.seed(99999) N=100 C=200 reduction='mean' input = paddle.rand([N, C], dtype='float64') label = paddle.randint(0, C, shape=[N], dtype='int64') weight = paddle.rand([C], dtype='float64') cross_entropy_loss = paddle.nn.loss.CrossEntropyLoss( weight=weight, reduction=reduction) dy_ret = cross_entropy_loss( input, label) print(dy_ret.numpy()) #[5.41993642]
Example2(soft labels):
import paddle paddle.seed(99999) axis = -1 ignore_index = -100 N = 4 C = 3 shape = [N, C] reduction='mean' weight = None logits = paddle.uniform(shape, dtype='float64', min=0.1, max=1.0) labels = paddle.uniform(shape, dtype='float64', min=0.1, max=1.0) labels /= paddle.sum(labels, axis=axis, keepdim=True) paddle_loss_mean = paddle.nn.functional.cross_entropy( logits, labels, soft_label=True, axis=axis, weight=weight, reduction=reduction) print(paddle_loss_mean.numpy()) #[1.12908343]
-
forward
(
input,
label
)
forward¶
-
Defines the computation performed at every call. Should be overridden by all subclasses.
- Parameters
-
*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments