FusedFeedForward

class paddle.incubate.nn. FusedFeedForward ( d_model: int, dim_feedforward: int, dropout_rate: float = 0.1, epsilon: float = 1e-05, activation: str = 'relu', act_dropout_rate: float | None = None, normalize_before: bool = False, linear1_weight_attr: ParamAttrLike | None = None, linear1_bias_attr: ParamAttrLike | None = None, linear2_weight_attr: ParamAttrLike | None = None, linear2_bias_attr: ParamAttrLike | None = None, ln1_scale_attr: ParamAttrLike | None = None, ln1_bias_attr: ParamAttrLike | None = None, ln2_scale_attr: ParamAttrLike | None = None, ln2_bias_attr: ParamAttrLike | None = None, nranks: int = 1, ring_id: int = -1, name: str | None = None ) [source]

Parameters

d_model (int) – The expected feature size in the input and output.
dim_feedforward (int) – The hidden layer size.
dropout_rate (float, optional) – The dropout probability used in pre-process and post-precess. Default 0.1
epsilon (float, optional) – he small value added to the variance to prevent division by zero. Default: 1e-05.
activation (str, optional) – The activation function. Default relu.
act_dropout_rate (float, optional) – The dropout probability after activation. If None, use the value of dropout_rate. Default None
normalize_before (bool, optional) – Indicate whether to put layer normalization into, preprocessing or postprocessing. Default False
linear1_weight_attr (ParamAttr, optional) – To specify the weight parameter property for FFN first linear. Default: None, which means the default weight parameter property is used. See usage for details in ParamAttr.
linear1_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN first linear. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in ParamAttr.
linear2_weight_attr (ParamAttr, optional) – To specify the weight parameter property for FFN second linear. Default: None, which means the default weight parameter property is used. See usage for details in ParamAttr.
linear2_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN second linear. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in ParamAttr.
ln1_scale_attr (ParamAttr, optional) – To specify the weight parameter property for FFN pre_layer_norm. Default: None, which means the default weight parameter property is used. See usage for details in ParamAttr.
ln1_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN pre_layer_norm. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in ParamAttr.
ln2_scale_attr (ParamAttr, optional) – To specify the weight parameter property for FFN post_layer_norm. Default: None, which means the default weight parameter property is used. See usage for details in ParamAttr.
ln2_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN layer_norm. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in ParamAttr.
nranks (int, optional) – Distributed tensor model parallel nranks. Default is 1, means not using tensor parallel.
ring_id (int, optional) – For distributed tensor model parallel. Default is -1, means not using tensor parallel.
name (str, optional) – The default value is None. Normally there is no need for user to set this property. For more information, please refer to Name.

Examples

>>> 
>>> import paddle
>>> from paddle.incubate.nn import FusedFeedForward
>>> paddle.device.set_device('gpu')

>>> fused_feedforward_layer = FusedFeedForward(8, 8)
>>> x = paddle.rand((1, 8, 8))
>>> out = fused_feedforward_layer(x)
>>> print(out.shape)
[1, 8, 8]

forward ( src: Tensor, cache: Tensor | None = None ) → Tensor forward¶

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters

*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments

extra_repr ( ) → str extra_repr¶: Extra representation of this layer, you can have custom implementation of your own layer.

FusedFeedForward

forward¶

extra_repr¶