FusedFeedForward

class paddle.incubate.nn. FusedFeedForward ( d_model, dim_feedforward, dropout_rate=0.1, epsilon=1e-05, activation='relu', act_dropout_rate=None, normalize_before=False, linear1_weight_attr=None, linear1_bias_attr=None, linear2_weight_attr=None, linear2_bias_attr=None, ln1_scale_attr=None, ln1_bias_attr=None, ln2_scale_attr=None, ln2_bias_attr=None, nranks=1, ring_id=- 1, name=None ) [source]
Parameters
  • d_model (int) – The expected feature size in the input and output.

  • dim_feedforward (int) – The hidden layer size.

  • dropout_rate (float, optional) – The dropout probability used in pre-process and post-precess. Default 0.1

  • epsilon (float, optional) – he small value added to the variance to prevent division by zero. Default: 1e-05.

  • activation (str, optional) – The activation function. Default relu.

  • act_dropout_rate (float, optional) – The dropout probability after activation. If None, use the value of dropout_rate. Default None

  • normalize_before (bool, optional) – Indicate whether to put layer normalization into, preprocessing or postprocessing. Default False

  • linear1_weight_attr (ParamAttr, optional) – To specify the weight parameter property for FFN first linear. Default: None, which means the default weight parameter property is used. See usage for details in ParamAttr.

  • linear1_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN first linear. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in ParamAttr.

  • linear2_weight_attr (ParamAttr, optional) – To specify the weight parameter property for FFN second linear. Default: None, which means the default weight parameter property is used. See usage for details in ParamAttr.

  • linear2_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN second linear. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in ParamAttr.

  • ln1_scale_attr (ParamAttr, optional) – To specify the weight parameter property for FFN pre_layer_norm. Default: None, which means the default weight parameter property is used. See usage for details in ParamAttr.

  • ln1_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN pre_layer_norm. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in ParamAttr.

  • ln2_scale_attr (ParamAttr, optional) – To specify the weight parameter property for FFN post_layer_norm. Default: None, which means the default weight parameter property is used. See usage for details in ParamAttr.

  • ln2_bias_attr (ParamAttr|bool, optional) – To specify the bias parameter property for FFN layer_norm. The False value means the corresponding layer would not have trainable bias parameter. Default: None, which means the default bias parameter property is used. See usage for details in ParamAttr.

  • nranks (int, optional) – Distributed tensor model parallel nranks. Default is 1, means not using tensor parallel.

  • ring_id (int, optional) – For distributed tensor model parallel. Default is -1, means not using tensor parallel.

  • name (str, optional) – The default value is None. Normally there is no need for user to set this property. For more information, please refer to Name.

Examples

>>> 
>>> import paddle
>>> from paddle.incubate.nn import FusedFeedForward
>>> paddle.device.set_device('gpu')

>>> fused_feedforward_layer = FusedFeedForward(8, 8)
>>> x = paddle.rand((1, 8, 8))
>>> out = fused_feedforward_layer(x)
>>> print(out.shape)
[1, 8, 8]
forward ( src, cache=None )

forward

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters
  • *inputs (tuple) – unpacked tuple arguments

  • **kwargs (dict) – unpacked dict arguments

extra_repr ( )

extra_repr

Extra representation of this layer, you can have custom implementation of your own layer.