dynamic_lstmp

paddle.fluid.layers. dynamic_lstmp ( input, size, proj_size, param_attr=None, bias_attr=None, use_peepholes=True, is_reverse=False, gate_activation='sigmoid', cell_activation='tanh', candidate_activation='tanh', proj_activation='tanh', dtype='float32', name=None, h_0=None, c_0=None, cell_clip=None, proj_clip=None ) [source]
api_attr

Static Graph

Note:
  1. In order to improve efficiency, users must first map the input of dimension [T, hidden_size] to input of [T, 4 * hidden_size], and then pass it to this OP.

This OP implements the LSTMP (LSTM Projected) layer. The LSTMP layer has a separate linear mapping layer behind the LSTM layer. – Sak, H., Senior, A., & Beaufays, F. (2014) .

Compared with the standard LSTM layer, LSTMP has an additional linear mapping layer, which is used to map from the original hidden state ht to the lower dimensional state rt . This reduces the total number of parameters and computational complexity, especially when the output unit is relatively large.

The default implementation of the OP contains diagonal/peephole connections, please refer to Gers, F. A., & Schmidhuber, J. (2000) . If you need to disable the peephole connections, set use_peepholes to False.

This OP computes each timestep as follows:

it=σ(Wixxt+Wirrt1+Wicct1+bi)
ft=σ(Wfxxt+Wfrrt1+Wfcct1+bf)
ot=σ(Woxxt+Worrt1+Wocct1+bo)
~ct=actg(Wcxxt+Wcrrt1+bc)
ct=ftct1+it~ct
ht=otacth(ct)
rt=¯acth(Wrhht)

The symbolic meanings in the formula are as follows:

  • xt represents the input at timestep t

  • ht represents the hidden state at timestep t

  • rt : represents the state of the projected output of the hidden state ht

  • ht1,ct1,rt1 represent the hidden state, cell state and projected output at timestep t1 , respectively

  • ~ct represents the candidate cell state

  • it , ft and ot represent input gate, forget gate, output gate, respectively

  • W represents weight (e.g., Wix is the weight of a linear transformation of input xt when calculating input gate it )

  • b represents bias (e.g., bi is the bias of input gate)

  • σ represents nonlinear activation function for gate, default sigmoid

  • represents the Hadamard product of a matrix, i.e. multiplying the elements of the same position for two matrices with the same dimension to get another matrix with the same dimension

Parameters
  • input (Variable) – The input of dynamic_lstmp layer, which supports variable-time length input sequence. It is a multi-dimensional LODTensor of shape [T,4hidden_size] . Data type is float32 or float64.

  • size (int) – must be 4 * hidden_size.

  • proj_size (int) – The size of projection output.

  • param_attr (ParamAttr, optional) –

    Parameter attribute of weight. If it is None, the default weight parameter attribute is used. Please refer to ref:api_fluid_ParamAttr’ . If the user needs to set this parameter, the dimension must be :math:`[hidden_size, 4*hidden_size] . Default: None.

    • Weights = {Wcr,Wir,Wfr,Wor} , the shape is [P, 4*hidden_size] , where P is the projection size.

    • Projection weight = {Wrh} , the shape is [hidden_size, P].

  • bias_attr (ParamAttr, optional) –

    The bias attribute for the learnable bias weights, which contains two parts, input-hidden bias weights and peephole connections weights if setting use_peepholes to True. Please refer to ref:`api_fluid_ParamAttr’ . Default: None.

    System Message: WARNING/2 (/usr/local/lib/python3.8/site-packages/paddle/fluid/layers/rnn.py:docstring of paddle.fluid.layers.rnn.dynamic_lstmp, line 61); backlink

    Inline interpreted text or phrase reference start-string without end-string.

    1. use_peepholes = False - Biases = {bc,bi,bf,bo}. - The shape is [1, 4*hidden_size].

    2. use_peepholes = True - Biases = { :math:`b_c, b_i, b_f, b_o, W_{ic},

      System Message: WARNING/2 (/usr/local/lib/python3.8/site-packages/paddle/fluid/layers/rnn.py:docstring of paddle.fluid.layers.rnn.dynamic_lstmp, line 70); backlink

      Inline interpreted text or phrase reference start-string without end-string.

      System Message: ERROR/3 (/usr/local/lib/python3.8/site-packages/paddle/fluid/layers/rnn.py:docstring of paddle.fluid.layers.rnn.dynamic_lstmp, line 72)

      Unexpected indentation.

      W_{fc}, W_{oc}`}.

      System Message: WARNING/2 (/usr/local/lib/python3.8/site-packages/paddle/fluid/layers/rnn.py:docstring of paddle.fluid.layers.rnn.dynamic_lstmp, line 73)

      Block quote ends without a blank line; unexpected unindent.

      • The shape is [1, 7*hidden_size].

  • use_peepholes (bool, optional) – Whether to use peephole connection or not. Default True.

  • is_reverse (bool, optional) – Whether to calculate reverse LSTM. Default False.

  • gate_activation (str, optional) – The activation for input gate, forget gate and output gate. Default “sigmoid”.

  • cell_activation (str, optional) – The activation for cell output. Default “tanh”.

  • candidate_activation (str, optional) – The activation for candidate hidden state. Default “tanh”.

  • proj_activation (str, optional) – The activation for projection output. Default “tanh”.

  • dtype (str, optional) – Data type, can be “float32” or “float64”. Default “float32”.

  • name (str, optional) – A name for this layer. Please refer to Name . Default: None.

  • h_0 (Variable , optional) – The initial hidden state is an optional input, default is zero. This is a tensor with shape [batch_size,P] , where P is the projection size. Default: None.

  • c_0 (Variable , optional) – The initial cell state is an optional input, default is zero. This is a tensor with shape [batch_size,P] , where P is the projection size. h_0 and c_0 can be None but only at the same time. Default: None.

  • cell_clip (float, optional) – If not None, the cell state is clipped by this value prior to the cell output activation. Default: None.

  • proj_clip (float, optional) – If num_proj > 0 and proj_clip is provided, then the projected values are clipped elementwise to within [-proj_clip, proj_clip]. Default: None.

Returns

The hidden state and cell state of LSTMP

  • hidden: LoDTensor with shape of [T,P] , and its lod and dtype is the same as the input.

  • cell: LoDTensor with shape of [T,hidden_size] , and its lod and dtype is the same as the input.

Return type

tuple ( Variable , Variable )

Examples

import paddle.fluid as fluid
dict_dim, emb_dim = 128, 64
data = fluid.data(name='sequence', shape=[None], dtype='int64', lod_level=1)
emb = fluid.embedding(input=data, size=[dict_dim, emb_dim])
hidden_dim, proj_dim = 512, 256
fc_out = fluid.layers.fc(input=emb, size=hidden_dim * 4,
                        act=None, bias_attr=None)
proj_out, last_c = fluid.layers.dynamic_lstmp(input=fc_out,
                                        size=hidden_dim * 4,
                                        proj_size=proj_dim,
                                        use_peepholes=False,
                                        is_reverse=True,
                                        cell_activation="tanh",
                                        proj_activation="tanh")
proj_out.shape  # (-1, 256)
last_c.shape  # (-1, 512)