kl_div

paddle.nn.functional. kl_div ( input: Tensor, label: Tensor, reduction: _ReduceMode = 'mean', log_target: bool = False, name: str | None = None ) → Tensor [source]

Calculate the Kullback-Leibler divergence loss between Input(X) and Input(Target). Notes that Input(X) is the log-probability and Input(Target) is the probability.

KL divergence loss is calculated as follows:

If log_target is False:

$$l(x, y) = y * (log(y) - x)$$

If log_target is True:

$$l(x, y) = exp(y) * (y - x)$$

Here $x$ is input and $y$ is label.

If reduction is 'none', the output loss is the same shape as the input, and the loss at each point is calculated separately. There is no reduction to the result.

If reduction is 'mean', the output loss is the shape of [], and the output is the average of all losses.

If reduction is 'sum', the output loss is the shape of [], and the output is the sum of all losses.

If reduction is 'batchmean', the output loss is the shape of [N], N is the batch size, and the output is the sum of all losses divided by the batch size.

Parameters

input (Tensor) – The input tensor. The shapes is [N, *], where N is batch size and * means any number of additional dimensions. It’s data type should be float32, float64.
label (Tensor) – label. The shapes is [N, *], same shape as input . It’s data type should be float32, float64.
reduction (str, optional) – Indicate how to average the loss, the candidates are 'none' | 'batchmean' | 'mean' | 'sum'. If reduction is 'mean', the reduced mean loss is returned; If reduction is 'batchmean', the sum loss divided by batch size is returned; if reduction is 'sum', the reduced sum loss is returned; if reduction is 'none', no reduction will be applied. Default is 'mean'.
log_target (bool, optional) – Indicate whether label is passed in log space. Default is False.
name (str, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.

Returns

The KL divergence loss. The data type is same as input tensor

Return type

Tensor

Examples

>>> import paddle
>>> import paddle.nn.functional as F
>>> paddle.seed(2023)

>>> shape = (5, 20)

>>> # input(x) should be a distribution in the log space
>>> x = F.log_softmax(paddle.randn(shape), axis=1).astype('float32')

>>> target = paddle.uniform(shape, min=-10, max=10).astype('float32')

>>> # 'batchmean' reduction, loss shape will be [], who is 0-D Tensor
>>> pred_loss = F.kl_div(x, target, reduction='batchmean')
>>> print(pred_loss.shape)
[]

>>> # 'mean' reduction, loss shape will be [], who is 0-D Tensor
>>> pred_loss = F.kl_div(x, target, reduction='mean')
>>> print(pred_loss.shape)
[]

>>> # 'sum' reduction, loss shape will be [], who is 0-D Tensor
>>> pred_loss = F.kl_div(x, target, reduction='sum')
>>> print(pred_loss.shape)
[]

>>> # 'none' reduction, loss shape is same with input shape
>>> pred_loss = F.kl_div(x, target, reduction='none')
>>> print(pred_loss.shape)
[5, 20]

>>> # if label is in the log space, set log_target = True
>>> target = paddle.uniform(shape, min=0, max=10).astype('float32')
>>> log_target = paddle.log(target)
>>> pred_loss_1 = F.kl_div(x, target, reduction='none')
>>> pred_loss_2 = F.kl_div(x, log_target, reduction='none', log_target=True)
>>> print(paddle.allclose(pred_loss_1, pred_loss_2))
Tensor(shape=[], dtype=bool, place=Place(cpu), stop_gradient=True,
True)