FusedFeedForward¶
- class paddle.incubate.nn. FusedFeedForward ( d_model, dim_feedforward, dropout_rate=0.1, epsilon=1e-05, activation='relu', act_dropout_rate=None, normalize_before=False, linear1_weight_attr=None, linear1_bias_attr=None, linear2_weight_attr=None, linear2_bias_attr=None, ln1_scale_attr=None, ln1_bias_attr=None, ln2_scale_attr=None, ln2_bias_attr=None, nranks=1, ring_id=- 1, name=None ) [source]
-
- Parameters
-
d_model (int) – The expected feature size in the input and output.
dim_feedforward (int) – The hidden layer size.
dropout_rate (float, optional) – The dropout probability used in pre-process and post-precess. Default 0.1
epsilon (float, optional) – he small value added to the variance to prevent division by zero. Default: 1e-05.
activation (str, optional) – The activation function. Default relu.
act_dropout_rate (float, optional) – The dropout probability after activition. If None, use the value of dropout_rate. Default None
normalize_before (bool, optional) – Indicate whether to put layer normalization into, preprocessing or postprocessing. Default False
linear1_weight_attr (ParamAttr, optional) – To specify the weight parameter property for FFN first linear. Default: None, which means the default weight parameter property is used. See usage for details in
ParamAttr
.linear1_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN first linear. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in
ParamAttr
.linear2_weight_attr (ParamAttr, optional) – To specify the weight parameter property for FFN second linear. Default: None, which means the default weight parameter property is used. See usage for details in
ParamAttr
.linear2_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN second linear. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in
ParamAttr
.ln1_scale_attr (ParamAttr, optional) – To specify the weight parameter property for FFN pre_layer_norm. Default: None, which means the default weight parameter property is used. See usage for details in
ParamAttr
.ln1_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN pre_layer_norm. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in
ParamAttr
.ln2_scale_attr (ParamAttr, optional) – To specify the weight parameter property for FFN post_layer_norm. Default: None, which means the default weight parameter property is used. See usage for details in
ParamAttr
.ln2_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN layer_norm. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in
ParamAttr
.nranks (int, optional) – Distributed tensor model parallel nranks. Default is 1, means not using tensor parallel.
ring_id (int, optional) – For distributed tensor model parallel. Default is -1, means not using tensor parallel.
name (str, optional) – The default value is None. Normally there is no need for user to set this property. For more information, please refer to Name.
Examples
>>> >>> import paddle >>> from paddle.incubate.nn import FusedFeedForward >>> paddle.device.set_device('gpu') >>> fused_feedforward_layer = FusedFeedForward(8, 8) >>> x = paddle.rand((1, 8, 8)) >>> out = fused_feedforward_layer(x) >>> print(out.shape) [1, 8, 8]
-
forward
(
src,
cache=None
)
forward¶
-
Defines the computation performed at every call. Should be overridden by all subclasses.
- Parameters
-
*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments
-
extra_repr
(
)
extra_repr¶
-
Extra representation of this layer, you can have custom implementation of your own layer.