LSTMCell

class paddle.nn. LSTMCell ( input_size: int, hidden_size: int, weight_ih_attr: ParamAttrLike | None = None, weight_hh_attr: ParamAttrLike | None = None, bias_ih_attr: ParamAttrLike | None = None, bias_hh_attr: ParamAttrLike | None = None, proj_size: int = 0, name: str | None = None ) [source]

Long-Short Term Memory(LSTM) RNN cell. Given the inputs and previous states, it computes the outputs and updates states.

The formula used is as follows:

\[ \begin{align}\begin{aligned}i_{t} & = \sigma(W_{ii}x_{t} + b_{ii} + W_{hi}h_{t-1} + b_{hi})\\f_{t} & = \sigma(W_{if}x_{t} + b_{if} + W_{hf}h_{t-1} + b_{hf})\\o_{t} & = \sigma(W_{io}x_{t} + b_{io} + W_{ho}h_{t-1} + b_{ho})\\\widetilde{c}_{t} & = \tanh (W_{ig}x_{t} + b_{ig} + W_{hg}h_{t-1} + b_{hg})\\c_{t} & = f_{t} * c_{t-1} + i_{t} * \widetilde{c}_{t}\\h_{t} & = o_{t} * \tanh(c_{t})\\y_{t} & = h_{t}\end{aligned}\end{align} \]

If proj_size is specified, the dimension of hidden state \(h_{t}\) will be projected to proj_size:

\[h_{t} = h_{t}W_{proj\_size}\]

where \(\sigma\) is the sigmoid function, and * is the elementwise multiplication operator.

Please refer to An Empirical Exploration of Recurrent Network Architectures for more details.

Parameters

input_size (int) – The input size.
hidden_size (int) – The hidden size.
weight_ih_attr (ParamAttr|None, optional) – The parameter attribute for weight_ih. Default: None.
weight_hh_attr (ParamAttr|None, optional) – The parameter attribute for weight_hh. Default: None.
bias_ih_attr (ParamAttr|None, optional) – The parameter attribute for the bias_ih. Default: None.
bias_hh_attr (ParamAttr|None, optional) – The parameter attribute for the bias_hh. Default: None.
proj_size (int, optional) – If specified, the output hidden state will be projected to proj_size. proj_size must be smaller than hidden_size. Default: None.
name (str|None, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.

Variables:

weight_ih (Parameter): shape (4 * hidden_size, input_size), input to hidden weight, which corresponds to the concatenation of \(W_{ii}, W_{if}, W_{ig}, W_{io}\) in the formula.
weight_hh (Parameter): shape (4 * hidden_size, hidden_size), hidden to hidden weight, which corresponds to the concatenation of \(W_{hi}, W_{hf}, W_{hg}, W_{ho}\) in the formula. If proj_size was specified, the shape will be (4 * hidden_size, proj_size).
weight_ho (Parameter, optional): shape (hidden_size, proj_size), project the hidden state.
bias_ih (Parameter): shape (4 * hidden_size, ), input to hidden bias, which corresponds to the concatenation of \(b_{ii}, b_{if}, b_{ig}, b_{io}\) in the formula.
bias_hh (Parameter): shape (4 * hidden_size, ), hidden to hidden bias, which corresponds to the concatenation of \(b_{hi}, b_{hf}, b_{hg}, b_{ho}\) in the formula.

Inputs:

inputs (Tensor): shape [batch_size, input_size], the input, corresponding to \(x_t\) in the formula.
states (list|tuple, optional): a list/tuple of two tensors, each of shape [batch_size, hidden_size], the previous hidden state, corresponding to \(h_{t-1}, c_{t-1}\) in the formula. When states is None, zero state is used. Defaults to None.

Returns

outputs (Tensor). Shape [batch_size, hidden_size], the output, corresponding to \(h_{t}\) in the formula. If proj_size is specified, output shape will be [batch_size, proj_size].
states (tuple). A tuple of two tensors, each of shape [batch_size, hidden_size], the new hidden states, corresponding to \(h_{t}, c_{t}\) in the formula.

If proj_size is specified, shape of \(h_{t}\) will be [batch_size, proj_size].

Notes

All the weights and bias are initialized with Uniform(-std, std) by default. Where std = \(\frac{1}{\sqrt{hidden\_size}}\). For more information about parameter initialization, please refer to ParamAttr.

Examples

>>> import paddle

>>> x = paddle.randn((4, 16))
>>> prev_h = paddle.randn((4, 32))
>>> prev_c = paddle.randn((4, 32))

>>> cell = paddle.nn.LSTMCell(16, 32)
>>> y, (h, c) = cell(x, (prev_h, prev_c))

>>> print(y.shape)
[4, 32]
>>> print(h.shape)
[4, 32]
>>> print(c.shape)
[4, 32]

forward ( inputs: Tensor, states: Sequence[Tensor] | None = None ) forward¶

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters

*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments

property state_shape : tuple[tuple[int], tuple[int]]: The state_shape of LSTMCell is a tuple with two shapes: ((hidden_size, ), (hidden_size,)). (-1 for batch size would be automatically inserted into shape). These two shapes correspond to \(h_{t-1}\) and \(c_{t-1}\) separately.

extra_repr ( ) → str extra_repr¶: Extra representation of this layer, you can have custom implementation of your own layer.

LSTMCell

forward¶

extra_repr¶