basic_lstm¶
- paddle.fluid.contrib.layers.rnn_impl. basic_lstm ( input, init_hidden, init_cell, hidden_size, num_layers=1, sequence_length=None, dropout_prob=0.0, bidirectional=False, batch_first=True, param_attr=None, bias_attr=None, gate_activation=None, activation=None, forget_bias=1.0, dtype='float32', name='basic_lstm' ) [source]
-
LSTM implementation using basic operators, supports multiple layers and bidirectional LSTM.
\[ \begin{align}\begin{aligned}i_t &= \sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + b_i)\\f_t &= \sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + b_f + forget_bias )\\o_t &= \sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + b_o)\\\begin{split}\\tilde{c_t} &= tanh(W_{cx}x_t + W_{ch}h_{t-1} + b_c)\end{split}\\\begin{split}c_t &= f_t \odot c_{t-1} + i_t \odot \\tilde{c_t}\end{split}\\h_t &= o_t \odot tanh(c_t)\end{aligned}\end{align} \]- Parameters
-
input (Variable) – lstm input tensor, if batch_first = False, shape should be ( seq_len x batch_size x input_size ) if batch_first = True, shape should be ( batch_size x seq_len x hidden_size )
init_hidden (Variable|None) – The initial hidden state of the LSTM This is a tensor with shape ( num_layers x batch_size x hidden_size) if is_bidirec = True, shape should be ( num_layers*2 x batch_size x hidden_size) and can be reshaped to a tensor with shape ( num_layers x 2 x batch_size x hidden_size) to use. If it’s None, it will be set to all 0.
init_cell (Variable|None) – The initial hidden state of the LSTM This is a tensor with shape ( num_layers x batch_size x hidden_size) if is_bidirec = True, shape should be ( num_layers*2 x batch_size x hidden_size) and can be reshaped to a tensor with shape ( num_layers x 2 x batch_size x hidden_size) to use. If it’s None, it will be set to all 0.
hidden_size (int) – Hidden size of the LSTM
num_layers (int) – The total number of layers of the LSTM
sequence_length (Variabe|None) – A tensor (shape [batch_size]) stores each real length of each instance, This tensor will be convert to a mask to mask the padding ids If it’s None means NO padding ids
dropout_prob (float|0.0) – Dropout prob, dropout ONLY work after rnn output of each layers, NOT between time steps
bidirectional (bool|False) – If it is bidirectional
batch_first (bool|True) – The shape format of the input and output tensors. If true, the shape format should be
[batch_size, seq_len, hidden_size]
. If false, the shape format should be[seq_len, batch_size, hidden_size]
. By default this function accepts input and emits output in batch-major form to be consistent with most of data format, though a bit less efficient because of extra transposes.param_attr (ParamAttr|None) – The parameter attribute for the learnable weight matrix. Note: If it is set to None or one attribute of ParamAttr, lstm_unit will create ParamAttr as param_attr. If the Initializer of the param_attr is not set, the parameter is initialized with Xavier. Default: None.
bias_attr (ParamAttr|None) – The parameter attribute for the bias of LSTM unit. If it is set to None or one attribute of ParamAttr, lstm_unit will create ParamAttr as bias_attr. If the Initializer of the bias_attr is not set, the bias is initialized zero. Default: None.
gate_activation (function|None) – The activation function for gates (actGate). Default: ‘fluid.layers.sigmoid’
activation (function|None) – The activation function for cell (actNode). Default: ‘fluid.layers.tanh’
forget_bias (float|1.0) – Forget bias used to compute the forget gate
dtype (string) – Data type used in this unit
name (string) – Name used to identify parameters and biases
- Returns
-
- rnn_out(Tensor), last_hidden(Tensor), last_cell(Tensor)
-
rnn_out is the result of LSTM hidden, shape is (seq_len x batch_size x hidden_size) if is_bidirec set to True, it’s shape will be ( seq_len x batch_sze x hidden_size*2)
last_hidden is the hidden state of the last step of LSTM with shape ( num_layers x batch_size x hidden_size ) if is_bidirec set to True, it’s shape will be ( num_layers*2 x batch_size x hidden_size), and can be reshaped to a tensor ( num_layers x 2 x batch_size x hidden_size) to use.
last_cell is the hidden state of the last step of LSTM with shape ( num_layers x batch_size x hidden_size ) if is_bidirec set to True, it’s shape will be ( num_layers*2 x batch_size x hidden_size), and can be reshaped to a tensor ( num_layers x 2 x batch_size x hidden_size) to use.
Examples
import paddle.fluid.layers as layers from paddle.fluid.contrib.layers import basic_lstm batch_size = 20 input_size = 128 hidden_size = 256 num_layers = 2 dropout = 0.5 bidirectional = True batch_first = False input = layers.data( name = "input", shape = [-1, batch_size, input_size], dtype='float32') pre_hidden = layers.data( name = "pre_hidden", shape=[-1, hidden_size], dtype='float32') pre_cell = layers.data( name = "pre_cell", shape=[-1, hidden_size], dtype='float32') sequence_length = layers.data( name="sequence_length", shape=[-1], dtype='int32') rnn_out, last_hidden, last_cell = basic_lstm( input, pre_hidden, pre_cell, \ hidden_size, num_layers = num_layers, \ sequence_length = sequence_length, dropout_prob=dropout, bidirectional = bidirectional, \ batch_first = batch_first)