LSTMCell¶
- class paddle.nn. LSTMCell ( input_size, hidden_size, weight_ih_attr=None, weight_hh_attr=None, bias_ih_attr=None, bias_hh_attr=None, proj_size=0, name=None ) [source]
-
Long-Short Term Memory(LSTM) RNN cell. Given the inputs and previous states, it computes the outputs and updates states.
The formula used is as follows:
\[ \begin{align}\begin{aligned}i_{t} & = \sigma(W_{ii}x_{t} + b_{ii} + W_{hi}h_{t-1} + b_{hi})\\f_{t} & = \sigma(W_{if}x_{t} + b_{if} + W_{hf}h_{t-1} + b_{hf})\\o_{t} & = \sigma(W_{io}x_{t} + b_{io} + W_{ho}h_{t-1} + b_{ho})\\\widetilde{c}_{t} & = \tanh (W_{ig}x_{t} + b_{ig} + W_{hg}h_{t-1} + b_{hg})\\c_{t} & = f_{t} * c_{t-1} + i_{t} * \widetilde{c}_{t}\\h_{t} & = o_{t} * \tanh(c_{t})\\y_{t} & = h_{t}\end{aligned}\end{align} \]If proj_size is specified, the dimension of hidden state \(h_{t}\) will be projected to proj_size:
\[h_{t} = h_{t}W_{proj\_size}\]where \(\sigma\) is the sigmoid function, and * is the elementwise multiplication operator.
Please refer to An Empirical Exploration of Recurrent Network Architectures for more details.
- Parameters
-
input_size (int) – The input size.
hidden_size (int) – The hidden size.
weight_ih_attr (ParamAttr, optional) – The parameter attribute for weight_ih. Default: None.
weight_hh_attr (ParamAttr, optional) – The parameter attribute for weight_hh. Default: None.
bias_ih_attr (ParamAttr, optional) – The parameter attribute for the bias_ih. Default: None.
bias_hh_attr (ParamAttr, optional) – The parameter attribute for the bias_hh. Default: None.
proj_size (int, optional) – If specified, the output hidden state will be projected to proj_size. proj_size must be smaller than hidden_size. Default: None.
name (str, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.
- Variables:
-
weight_ih (Parameter): shape (4 * hidden_size, input_size), input to hidden weight, which corresponds to the concatenation of \(W_{ii}, W_{if}, W_{ig}, W_{io}\) in the formula.
weight_hh (Parameter): shape (4 * hidden_size, hidden_size), hidden to hidden weight, which corresponds to the concatenation of \(W_{hi}, W_{hf}, W_{hg}, W_{ho}\) in the formula. If proj_size was specified, the shape will be (4 * hidden_size, proj_size).
weight_ho (Parameter, optional): shape (hidden_size, proj_size), project the hidden state.
bias_ih (Parameter): shape (4 * hidden_size, ), input to hidden bias, which corresponds to the concatenation of \(b_{ii}, b_{if}, b_{ig}, b_{io}\) in the formula.
bias_hh (Parameter): shape (4 * hidden_size, ), hidden to hidden bias, which corresponds to the concatenation of \(b_{hi}, b_{hf}, b_{hg}, b_{ho}\) in the formula.
- Inputs:
-
inputs (Tensor): shape [batch_size, input_size], the input, corresponding to \(x_t\) in the formula.
states (list|tuple, optional): a list/tuple of two tensors, each of shape [batch_size, hidden_size], the previous hidden state, corresponding to \(h_{t-1}, c_{t-1}\) in the formula. When states is None, zero state is used. Defaults to None.
- Returns
-
outputs (Tensor). Shape [batch_size, hidden_size], the output, corresponding to \(h_{t}\) in the formula. If proj_size is specified, output shape will be [batch_size, proj_size].
-
- states (tuple). A tuple of two tensors, each of shape [batch_size, hidden_size], the new hidden states, corresponding to \(h_{t}, c_{t}\) in the formula.
-
If proj_size is specified, shape of \(h_{t}\) will be [batch_size, proj_size].
Notes
All the weights and bias are initialized with Uniform(-std, std) by default. Where std = \(\frac{1}{\sqrt{hidden\_size}}\). For more information about parameter initialization, please refer to ParamAttr.
Examples
>>> import paddle >>> x = paddle.randn((4, 16)) >>> prev_h = paddle.randn((4, 32)) >>> prev_c = paddle.randn((4, 32)) >>> cell = paddle.nn.LSTMCell(16, 32) >>> y, (h, c) = cell(x, (prev_h, prev_c)) >>> print(y.shape) [4, 32] >>> print(h.shape) [4, 32] >>> print(c.shape) [4, 32]
-
forward
(
inputs,
states=None
)
forward¶
-
Defines the computation performed at every call. Should be overridden by all subclasses.
- Parameters
-
*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments
- property state_shape
-
The state_shape of LSTMCell is a tuple with two shapes: ((hidden_size, ), (hidden_size,)). (-1 for batch size would be automatically inserted into shape). These two shapes correspond to \(h_{t-1}\) and \(c_{t-1}\) separately.
-
extra_repr
(
)
extra_repr¶
-
Extra representation of this layer, you can have custom implementation of your own layer.