Model Parameters
Note
The paddle.fluid.* APIs are deprecated. Please use the latest Paddle API versions instead.
Model parameters are usually weights and bias in the model. In Paddle’s dynamic graph mode, they correspond to the EagerParamBase
class. Model parameters are learnable variables that have gradients and can be optimized. In Paddle, custom parameters can be created using create_parameter .
You can configure properties related to model parameters using ParamAttr . The configurable options include:
Initialization method
Regularization
Model averaging
Clipping
Initialization Method
Paddle initializes a single parameter by setting attributes of initializer
in ParamAttr
.Example:
import paddle param_attrs = paddle.ParamAttr(name="fc_weight", initializer=paddle.nn.initializer.Constant(5.0)) fc_layer = paddle.nn.Linear(64, 10, weight_attr=param_attrs)
The following is the initialization method supported by Paddle:
1. Constant
The constant initialization method sets parameters to a fixed value, such as initializing biases to 0.
Parameter: value specifies the initial value (default is 0.0).
API reference: Constant
2. Normal
The random normal distribution method generates values based on a normal (Gaussian) distribution, suitable for initializing most neural network parameters.
Parameters: mean (default 0.0) and std (default 1.0) define the mean and standard deviation.
API reference: Normal
3. Uniform
The random uniform distribution method samples values evenly within a specified range [low, high].
Parameters: low (default -1.0) and high (default 1.0) define the range.
API reference: Uniform
4. XavierUniform
The Xavier uniform distribution method, proposed by Xavier Glorot and Yoshua Bengio in the paper Understanding the difficulty of training deep feedforward neural networks, initializes parameters based on a uniform distribution.
Range is determined by fan_in (input dimension), fan_out (output dimension), and gain (scaling factor).
API reference: XavierUniform
5. XavierNormal
The Xavier normal distribution method, proposed in the paper Understanding the difficulty of training deep feedforward neural networks, initializes parameters with a normal distribution.
Mean is 0, and the standard deviation is determined by fan_in, fan_out, and gain .
API reference: XavierNormal
6. KaimingUniform
The Kaiming uniform distribution method, proposed by Kaiming He et al. in the paper Delving Deep into Rectifiers, is designed for networks with specific activation functions.
Range is determined by fan_in, negative_slope (default 0), and nonlinearity (default ‘relu’).
API reference: KaimingUniform
7. KaimingNormal
The Kaiming normal distribution method, proposed in the paper Delving Deep into Rectifiers, uses a normal distribution.
Mean is 0, and the standard deviation is determined by fan_in, negative_slope (default 0), and nonlinearity (default ‘relu’).
API reference: KaimingNormal
8. TruncatedNormal
The truncated normal distribution method limits the generated values from a normal distribution to a specified range [a, b].
Parameters: mean (default 0.0), std (default 1.0), and truncation bounds a and b (default -2.0 and 2.0).
API reference: TruncatedNormal
Other Initialization Methods
Paddle also supports the following initialization methods:
Assign: Initialize directly using a NumPy array, Python list, or Tensor.
Bilinear: Used for upsampling in transposed convolutions to enlarge feature maps.
Dirac: Initializes convolution kernels with a Dirac delta function to preserve input characteristics.
Orthogonal: Generates an orthogonal matrix for initialization, ensuring (semi-)orthogonality.
set_global_initializer: Sets a global initialization method, effective only for code that follows it.
Regularization
Paddle regularizes a single parameter by setting attributes of regularizer
in ParamAttr
.
import paddle param_attrs = paddle.ParamAttr(name="fc_weight", regularizer=paddle.regularizer.L1Decay(0.1)) fc_layer = paddle.nn.Linear(64, 10, weight_attr=param_attrs)
The following is the regularization approach supported by Paddle:
Model Averaging
Paddle determines whether to average a single parameter by setting attributes of do_model_average
in ParamAttr
.
Default value: True .
import paddle param_attrs = paddle.ParamAttr(name="fc_weight", do_model_average=True) fc_layer = paddle.nn.Linear(64, 10, weight_attr=param_attrs)
During mini-batch training, the model parameters are updated after each batch. Model averaging calculates the average of the parameters from the most recent k updates.
The averaged parameters are used only for testing and prediction, not for training.
API reference: ModelAverage (currently in incubation and may undergo changes).
Clipping
Note
The gradient_clip
attribute is deprecated. Use need_clip
to control whether gradient clipping is applied, and configure the clipping method when initializing the optimizer
.
Paddle determines whether gradient clipping is applied to a single parameter by setting attributes of need_clip
in ParamAttr
.
Default value: True .
import paddle param_attrs = paddle.ParamAttr(name="fc_weight", need_clip=True) fc_layer = paddle.nn.Linear(64, 10, weight_attr=param_attrs)
The following is the clipping method supported by Paddle:
1. GradientClipByGlobalNorm
Limits the sum of the L2 norms of all Tensors in a Tensor list t_list to the clip_norm
range.
API reference: ClipGradByGlobalNorm
2. GradientClipByNorm
Limits the L2 norm of a multi-dimensional input Tensor X to the clip_norm
range.
API reference: ClipGradByNorm
3. GradientClipByValue
Limits the values of a multi-dimensional input Tensor X to the range [min, max].
API reference: ClipGradByValue
For more details on gradient clipping methods, refer to: Gradient clip methods in Paddle .