LayerNorm¶
- class paddle.nn. LayerNorm ( normalized_shape, epsilon=1e-05, weight_attr=None, bias_attr=None, name=None ) [source]
-
Construct a callable object of the
LayerNorm
class. For more details, refer to code examples. It implements the function of the Layer Normalization Layer and can be applied to mini-batch input data. Refer to Layer NormalizationThe formula is as follows:
\[ \begin{align}\begin{aligned}\mu & = \frac{1}{H}\sum_{i=1}^{H} x_i\\\sigma & = \sqrt{\frac{1}{H}\sum_{i=1}^{H}{(x_i - \mu)^2} + \epsilon}\\y & = f(\frac{g}{\sigma}(x - \mu) + b)\end{aligned}\end{align} \]\(x\): the vector representation of the summed inputs to the neurons in that layer.
\(H\): the number of hidden units in a layers
\(\epsilon\): the small value added to the variance to prevent division by zero.
\(g\): the trainable scale parameter.
\(b\): the trainable bias parameter.
- Parameters
-
normalized_shape (int|list|tuple) – Input shape from an expected input of size
[*, normalized_shape[0], normalized_shape[1], ..., normalized_shape[-1]]
. If it is a single integer, this module will normalize over the last dimension which is expected to be of that specific size.epsilon (float, optional) – The small value added to the variance to prevent division by zero. Default: 1e-05.
weight_attr (ParamAttr|bool, optional) – The parameter attribute for the learnable gain \(g\). If False, weight is None. If is None, a default
ParamAttr
would be added as scale. Theparam_attr
is initialized as 1 if it is added. Default: None. For more information, please refer to ParamAttr .bias_attr (ParamAttr|bool, optional) – The parameter attribute for the learnable bias \(b\). If is False, bias is None. If is None, a default
ParamAttr
would be added as bias. Thebias_attr
is initialized as 0 if it is added. Default: None. For more information, please refer to ParamAttr .name (str, optional) – Name for the LayerNorm, default is None. For more information, please refer to Name .
- Shape:
-
x: 2-D, 3-D, 4-D or 5-D tensor.
output: same shape as input x.
- Returns
-
Tensor
, the dimension is the same asx
, but the internal values have been normalized byLayerNorm
.
Examples
>>> import paddle >>> paddle.seed(100) >>> x = paddle.rand((2, 2, 2, 3)) >>> layer_norm = paddle.nn.LayerNorm(x.shape[1:]) >>> layer_norm_out = layer_norm(x) >>> print(layer_norm_out) Tensor(shape=[2, 2, 2, 3], dtype=float32, place=Place(cpu), stop_gradient=False, [[[[ 0.60520101, -0.67670590, -1.40020895], [ 0.46540466, -0.09736638, -0.47771254]], [[-0.74365306, 0.63718957, -1.41333175], [ 1.44764745, -0.25489068, 1.90842617]]], [[[ 1.09773350, 1.49568415, -0.45503747], [-1.01755989, 1.08368254, -0.38671425]], [[-0.62252408, 0.60490781, 0.13109133], [-0.81222653, 0.84285998, -1.96189952]]]])
-
forward
(
input
)
forward¶
-
Defines the computation performed at every call. Should be overridden by all subclasses.
- Parameters
-
*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments
-
extra_repr
(
)
extra_repr¶
-
Extra representation of this layer, you can have custom implementation of your own layer.