basic_lstm

paddle.fluid.contrib.layers.rnn_impl. basic_lstm ( input, init_hidden, init_cell, hidden_size, num_layers=1, sequence_length=None, dropout_prob=0.0, bidirectional=False, batch_first=True, param_attr=None, bias_attr=None, gate_activation=None, activation=None, forget_bias=1.0, dtype='float32', name='basic_lstm' ) [source]

LSTM implementation using basic operators, supports multiple layers and bidirectional LSTM.

\[ \begin{align}\begin{aligned}i_t &= \sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + b_i)\\f_t &= \sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + b_f + forget_bias )\\o_t &= \sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + b_o)\\\begin{split}\\tilde{c_t} &= tanh(W_{cx}x_t + W_{ch}h_{t-1} + b_c)\end{split}\\\begin{split}c_t &= f_t \odot c_{t-1} + i_t \odot \\tilde{c_t}\end{split}\\h_t &= o_t \odot tanh(c_t)\end{aligned}\end{align} \]
Parameters
  • input (Variable) – lstm input tensor, if batch_first = False, shape should be ( seq_len x batch_size x input_size ) if batch_first = True, shape should be ( batch_size x seq_len x hidden_size )

  • init_hidden (Variable|None) – The initial hidden state of the LSTM This is a tensor with shape ( num_layers x batch_size x hidden_size) if is_bidirec = True, shape should be ( num_layers*2 x batch_size x hidden_size) and can be reshaped to a tensor with shape ( num_layers x 2 x batch_size x hidden_size) to use. If it’s None, it will be set to all 0.

  • init_cell (Variable|None) – The initial hidden state of the LSTM This is a tensor with shape ( num_layers x batch_size x hidden_size) if is_bidirec = True, shape should be ( num_layers*2 x batch_size x hidden_size) and can be reshaped to a tensor with shape ( num_layers x 2 x batch_size x hidden_size) to use. If it’s None, it will be set to all 0.

  • hidden_size (int) – Hidden size of the LSTM

  • num_layers (int) – The total number of layers of the LSTM

  • sequence_length (Variabe|None) – A tensor (shape [batch_size]) stores each real length of each instance, This tensor will be convert to a mask to mask the padding ids If it’s None means NO padding ids

  • dropout_prob (float|0.0) – Dropout prob, dropout ONLY work after rnn output of each layers, NOT between time steps

  • bidirectional (bool|False) – If it is bidirectional

  • batch_first (bool|True) – The shape format of the input and output tensors. If true, the shape format should be [batch_size, seq_len, hidden_size]. If false, the shape format should be [seq_len, batch_size, hidden_size]. By default this function accepts input and emits output in batch-major form to be consistent with most of data format, though a bit less efficient because of extra transposes.

  • param_attr (ParamAttr|None) – The parameter attribute for the learnable weight matrix. Note: If it is set to None or one attribute of ParamAttr, lstm_unit will create ParamAttr as param_attr. If the Initializer of the param_attr is not set, the parameter is initialized with Xavier. Default: None.

  • bias_attr (ParamAttr|None) – The parameter attribute for the bias of LSTM unit. If it is set to None or one attribute of ParamAttr, lstm_unit will create ParamAttr as bias_attr. If the Initializer of the bias_attr is not set, the bias is initialized zero. Default: None.

  • gate_activation (function|None) – The activation function for gates (actGate). Default: ‘fluid.layers.sigmoid’

  • activation (function|None) – The activation function for cell (actNode). Default: ‘fluid.layers.tanh’

  • forget_bias (float|1.0) – Forget bias used to compute the forget gate

  • dtype (string) – Data type used in this unit

  • name (string) – Name used to identify parameters and biases

Returns

rnn_out(Tensor), last_hidden(Tensor), last_cell(Tensor)
  • rnn_out is the result of LSTM hidden, shape is (seq_len x batch_size x hidden_size) if is_bidirec set to True, it’s shape will be ( seq_len x batch_sze x hidden_size*2)

  • last_hidden is the hidden state of the last step of LSTM with shape ( num_layers x batch_size x hidden_size ) if is_bidirec set to True, it’s shape will be ( num_layers*2 x batch_size x hidden_size), and can be reshaped to a tensor ( num_layers x 2 x batch_size x hidden_size) to use.

  • last_cell is the hidden state of the last step of LSTM with shape ( num_layers x batch_size x hidden_size ) if is_bidirec set to True, it’s shape will be ( num_layers*2 x batch_size x hidden_size), and can be reshaped to a tensor ( num_layers x 2 x batch_size x hidden_size) to use.

Examples

import paddle.fluid.layers as layers
from paddle.fluid.contrib.layers import basic_lstm

batch_size = 20
input_size = 128
hidden_size = 256
num_layers = 2
dropout = 0.5
bidirectional = True
batch_first = False

input = layers.data( name = "input", shape = [-1, batch_size, input_size], dtype='float32')
pre_hidden = layers.data( name = "pre_hidden", shape=[-1, hidden_size], dtype='float32')
pre_cell = layers.data( name = "pre_cell", shape=[-1, hidden_size], dtype='float32')
sequence_length = layers.data( name="sequence_length", shape=[-1], dtype='int32')

rnn_out, last_hidden, last_cell = basic_lstm( input, pre_hidden, pre_cell, \
        hidden_size, num_layers = num_layers, \
        sequence_length = sequence_length, dropout_prob=dropout, bidirectional = bidirectional, \
        batch_first = batch_first)