kl_div¶
- paddle.nn.functional. kl_div ( input, label, reduction='mean', log_target=False, name=None ) [source]
-
Calculate the Kullback-Leibler divergence loss between Input(X) and Input(Target). Notes that Input(X) is the log-probability and Input(Target) is the probability.
KL divergence loss is calculated as follows:
If log_target is False:
$$l(x, y) = y * (log(y) - x)$$
If log_target is True:
$$l(x, y) = exp(y) * (y - x)$$
Here \(x\) is input and \(y\) is label.
If reduction is
'none'
, the output loss is the same shape as the input, and the loss at each point is calculated separately. There is no reduction to the result.If reduction is
'mean'
, the output loss is the shape of [], and the output is the average of all losses.If reduction is
'sum'
, the output loss is the shape of [], and the output is the sum of all losses.If reduction is
'batchmean'
, the output loss is the shape of [N], N is the batch size, and the output is the sum of all losses divided by the batch size.- Parameters
-
input (Tensor) – The input tensor. The shapes is [N, *], where N is batch size and * means any number of additional dimensions. It’s data type should be float32, float64.
label (Tensor) – label. The shapes is [N, *], same shape as
input
. It’s data type should be float32, float64.reduction (str, optional) – Indicate how to average the loss, the candidates are
'none'
|'batchmean'
|'mean'
|'sum'
. If reduction is'mean'
, the reduced mean loss is returned; If reduction is'batchmean'
, the sum loss divided by batch size is returned; if reduction is'sum'
, the reduced sum loss is returned; if reduction is'none'
, no reduction will be applied. Default is'mean'
.log_target (bool, optional) – Indicate whether label is passed in log space. Default is False.
name (str, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.
- Returns
-
The KL divergence loss. The data type is same as input tensor
- Return type
-
Tensor
Examples
>>> import paddle >>> import paddle.nn.functional as F >>> paddle.seed(2023) >>> shape = (5, 20) >>> # input(x) should be a distribution in the log space >>> x = F.log_softmax(paddle.randn(shape), axis=1).astype('float32') >>> target = paddle.uniform(shape, min=-10, max=10).astype('float32') >>> # 'batchmean' reduction, loss shape will be [], who is 0-D Tensor >>> pred_loss = F.kl_div(x, target, reduction='batchmean') >>> print(pred_loss.shape) [] >>> # 'mean' reduction, loss shape will be [], who is 0-D Tensor >>> pred_loss = F.kl_div(x, target, reduction='mean') >>> print(pred_loss.shape) [] >>> # 'sum' reduction, loss shape will be [], who is 0-D Tensor >>> pred_loss = F.kl_div(x, target, reduction='sum') >>> print(pred_loss.shape) [] >>> # 'none' reduction, loss shape is same with input shape >>> pred_loss = F.kl_div(x, target, reduction='none') >>> print(pred_loss.shape) [5, 20] >>> # if label is in the log space, set log_target = True >>> target = paddle.uniform(shape, min=0, max=10).astype('float32') >>> log_target = paddle.log(target) >>> pred_loss_1 = F.kl_div(x, target, reduction='none') >>> pred_loss_2 = F.kl_div(x, log_target, reduction='none', log_target=True) >>> print(paddle.allclose(pred_loss_1, pred_loss_2)) Tensor(shape=[], dtype=bool, place=Place(cpu), stop_gradient=True, True)