GRUCell

class paddle.nn. GRUCell ( input_size: int, hidden_size: int, weight_ih_attr: ParamAttrLike | None = None, weight_hh_attr: ParamAttrLike | None = None, bias_ih_attr: ParamAttrLike | None = None, bias_hh_attr: ParamAttrLike | None = None, name: str | None = None ) [source]

Gated Recurrent Unit (GRU) RNN cell. Given the inputs and previous states, it computes the outputs and updates states.

The formula for GRU used is as follows:

$\begin{align}\begin{aligned}r_{t} & = \sigma(W_{ir}x_{t} + b_{ir} + W_{hr}h_{t-1} + b_{hr})\\z_{t} & = \sigma(W_{iz}x_{t} + b_{iz} + W_{hz}h_{t-1} + b_{hz})\\\widetilde{h}_{t} & = \tanh(W_{ic}x_{t} + b_{ic} + r_{t} * (W_{hc}h_{t-1} + b_{hc}))\\h_{t} & = z_{t} * h_{t-1} + (1 - z_{t}) * \widetilde{h}_{t}\\y_{t} & = h_{t}\end{aligned}\end{align}$

where $\sigma$ is the sigmoid function, and * is the elementwise multiplication operator.

Please refer to An Empirical Exploration of Recurrent Network Architectures for more details.

Parameters

input_size (int) – The input size.
hidden_size (int) – The hidden size.
weight_ih_attr (ParamAttr|None, optional) – The parameter attribute for weight_ih. Default: None.
weight_hh_attr (ParamAttr|None, optional) – The parameter attribute for weight_hh. Default: None.
bias_ih_attr (ParamAttr|None, optional) – The parameter attribute for the bias_ih. Default: None.
bias_hh_attr (ParamAttr|None, optional) – The parameter attribute for the bias_hh. Default: None.
name (str|None, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.

Variables:

weight_ih (Parameter): shape (3 * hidden_size, input_size), input to hidden weight, which corresponds to the concatenation of $W_{ir}, W_{iz}, W_{ic}$ in the formula.
weight_hh (Parameter): shape (3 * hidden_size, hidden_size), hidden to hidden weight, which corresponds to the concatenation of $W_{hr}, W_{hz}, W_{hc}$ in the formula.
bias_ih (Parameter): shape (3 * hidden_size, ), input to hidden bias, which corresponds to the concatenation of $b_{ir}, b_{iz}, b_{ic}$ in the formula.
bias_hh (Parameter): shape (3 * hidden_size, ), hidden to hidden bias, which corresponds to the concatenation of $b_{hr}, b_{hz}, b_{hc}$ in the formula.

Inputs:

inputs (Tensor): A tensor with shape [batch_size, input_size], corresponding to $x_t$ in the formula.
states (Tensor): A tensor with shape [batch_size, hidden_size], corresponding to $h_{t-1}$ in the formula.

Returns

shape [batch_size, hidden_size], the output, corresponding to $h_{t}$ in the formula. - states (Tensor): shape [batch_size, hidden_size], the new hidden state, corresponding to $h_{t}$ in the formula.

Return type

outputs (Tensor)

Notes

All the weights and bias are initialized with Uniform(-std, std) by default. Where std = $\frac{1}{\sqrt{hidden\_size}}$ . For more information about parameter initialization, please refer to s:ref:api_paddle_ParamAttr.

Examples

>>> import paddle

>>> x = paddle.randn((4, 16))
>>> prev_h = paddle.randn((4, 32))

>>> cell = paddle.nn.GRUCell(16, 32)
>>> y, h = cell(x, prev_h)

>>> print(y.shape)
[4, 32]
>>> print(h.shape)
[4, 32]

forward ( inputs: Tensor, states: Tensor | None = None ) → tuple[Tensor, Tensor] forward¶

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters

*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments

property state_shape : tuple[int]: The state_shape of GRUCell is a shape [hidden_size] (-1 for batch size would be automatically inserted into shape). The shape corresponds to the shape of $h_{t-1}$ .

extra_repr ( ) → str extra_repr¶: Extra representation of this layer, you can have custom implementation of your own layer.

GRUCell

forward¶

extra_repr¶