ClipGradByNorm

class paddle.nn. ClipGradByNorm ( clip_norm: float ) [source]

Limit the l2 norm of multi-dimensional Tensor \(X\) to clip_norm .

If the l2 norm of \(X\) is greater than clip_norm , \(X\) will be compressed by a ratio.
If the l2 norm of \(X\) is less than or equal to clip_norm , nothing will be done.

The multidimensional Tensor \(X\) is not passed from this class, but the gradients of all parameters set in optimizer. If need_clip of specific param is False in its ParamAttr, then the gradients of this param will not be clipped.

Gradient clip will takes effect after being set in optimizer , see the document optimizer (for example: SGD).

The clipping formula is:

\[\begin{split}Out = \left\{ \begin{array}{ccl} X & & if (norm(X) \leq clip\_norm) \\ \frac{clip\_norm*X}{norm(X)} & & if (norm(X) > clip\_norm) \\ \end{array} \right.\end{split}\]

where \(norm(X)\) represents the L2 norm of \(X\).

\[norm(X) = ( \sum_{i=1}^{n}|x\_i|^2)^{ \frac{1}{2}}\]

Note

need_clip of ClipGradByNorm HAS BEEN DEPRECATED since 2.0. Please use need_clip in ParamAttr to specify the clip scope.

Parameters: clip_norm (float) – The maximum norm value.

Examples

>>> import paddle
>>> x = paddle.uniform([10, 10], min=-1.0, max=1.0, dtype='float32')
>>> linear = paddle.nn.Linear(in_features=10, out_features=10,
...                           weight_attr=paddle.ParamAttr(need_clip=True),
...                           bias_attr=paddle.ParamAttr(need_clip=False))
>>> out = linear(x)
>>> loss = paddle.mean(out)
>>> loss.backward()

>>> clip = paddle.nn.ClipGradByNorm(clip_norm=1.0)
>>> sdg = paddle.optimizer.SGD(learning_rate=0.1, parameters=linear.parameters(), grad_clip=clip)
>>> sdg.step()

Used in the guide/tutorials¶

Gradient clip methods in Paddle