BCEWithLogitsLoss¶
- class paddle.nn. BCEWithLogitsLoss ( weight=None, reduction='mean', pos_weight=None, name=None ) [source]
-
This operator combines the sigmoid layer and the api_nn_loss_BCELoss layer. Also, we can see it as the combine of
sigmoid_cross_entropy_with_logits
layer and some reduce operations.This measures the element-wise probability error in classification tasks in which each class is independent. This can be thought of as predicting labels for a data-point, where labels are not mutually exclusive. For example, a news article can be about politics, technology or sports at the same time or none of these.
First this operator calculate loss function as follows:
\[Out = -Labels * \log(\sigma(Logit)) - (1 - Labels) * \log(1 - \sigma(Logit))\]We know that \(\sigma(Logit) = \frac{1}{1 + e^{-Logit}}\). By substituting this we get:
\[Out = Logit - Logit * Labels + \log(1 + e^{-Logit})\]For stability and to prevent overflow of \(e^{-Logit}\) when Logit < 0, we reformulate the loss as follows:
\[Out = \max(Logit, 0) - Logit * Labels + \log(1 + e^{-\|Logit\|})\]Then, if
weight
orpos_weight
is not None, this operator multiply the weight tensor on the loss Out. Theweight
tensor will attach different weight on every items in the batch. Thepos_weight
will attach different weight on the positive label of each class.Finally, this operator applies reduce operation on the loss. If
reduction
set to'none'
, the operator will return the original loss Out. Ifreduction
set to'mean'
, the reduced mean loss is \(Out = MEAN(Out)\). Ifreduction
set to'sum'
, the reduced sum loss is \(Out = SUM(Out)\).Note that the target labels
label
should be numbers between 0 and 1.- Parameters
-
weight (Tensor, optional) – A manual rescaling weight given to the loss of each batch element. If given, it has to be a 1D Tensor whose size is [N, ], The data type is float32, float64. Default is
'None'
.reduction (str, optional) – Indicate how to average the loss by batch_size, the candicates are
'none'
|'mean'
|'sum'
. Ifreduction
is'none'
, the unreduced loss is returned; Ifreduction
is'mean'
, the reduced mean loss is returned; Ifreduction
is'sum'
, the summed loss is returned. Default is'mean'
.pos_weight (Tensor, optional) – A weight of positive examples. Must be a vector with length equal to the number of classes. The data type is float32, float64. Default is
'None'
.name (str, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.
- Shapes:
-
- logit (Tensor): The input predications tensor. 2-D tensor with shape: [N, *],
-
N is batch_size, * means number of additional dimensions. The
logit
is usually the output of Linear layer. Available dtype is float32, float64. - label (Tensor): The target labels tensor. 2-D tensor with the same shape as
-
logit
. The target labels which values should be numbers between 0 and 1. Available dtype is float32, float64. -
output (Tensor): If
reduction
is'none'
, the shape of output is -
same as
logit
, else the shape of output is scalar.
- Returns
-
A callable object of BCEWithLogitsLoss.
Examples
-
forward
(
logit,
label
)
forward¶
-
Defines the computation performed at every call. Should be overridden by all subclasses.
- Parameters
-
*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments