label_smooth¶
- paddle.nn.functional. label_smooth ( label, prior_dist=None, epsilon=0.1, name=None ) [source]
-
Label smoothing is a mechanism to regularize the classifier layer and is called label-smoothing regularization (LSR).Label smoothing is proposed to encourage the model to be less confident, since optimizing the log-likelihood of the correct label directly may cause overfitting and reduce the ability of the model to adapt.
Label smoothing replaces the ground-truth label \(y\) with the weighted sum of itself and some fixed distribution \(\mu\). For class \(k\), i.e.
\[\begin{split}\\tilde{y_k} = (1 - \epsilon) * y_k + \epsilon * \mu_k,\end{split}\]where \(1 - \epsilon\) and \(\epsilon\) are the weights respectively, and \(\\tilde{y}_k\) is the smoothed label. Usually uniform distribution is used for \(\mu\).
See more details about label smoothing in https://arxiv.org/abs/1512.00567.
- Parameters
-
label (Tensor) – The input variable containing the label data. The label data should use one-hot representation. It’s a multidimensional tensor with a shape of \([N_1, ..., Depth]\), where Depth is class number. The dtype can be “float16” “float32” and “float64”.
prior_dist (Tensor, optional) – The prior distribution to be used to smooth labels. If not provided, an uniform distribution is used. It’s a multidimensional tensor with a shape of \([1, class\_num]\) . The default value is None.
epsilon (float, optional) – The weight used to mix up the original ground-truth distribution and the fixed distribution. The default value is 0.1.
name (str, optional) – The default value is None. Normally there is no need for user to set this property. For more information, please refer to Name.
- Returns
-
The tensor containing the smoothed labels.
- Return type
-
Tensor
Examples
>>> import paddle >>> paddle.disable_static() >>> x = paddle.to_tensor([[[0, 1, 0], >>> [ 1, 0, 1]]], dtype="float32", stop_gradient=False) >>> output = paddle.nn.functional.label_smooth(x) >>> print(output) Tensor(shape=[1, 2, 3], dtype=float32, place=Place(cpu), stop_gradient=False, [[[0.03333334, 0.93333334, 0.03333334], [0.93333334, 0.03333334, 0.93333334]]])