GRUCell
- class paddle.nn. GRUCell ( input_size: int, hidden_size: int, weight_ih_attr: ParamAttrLike | None = None, weight_hh_attr: ParamAttrLike | None = None, bias_ih_attr: ParamAttrLike | None = None, bias_hh_attr: ParamAttrLike | None = None, name: str | None = None ) [source]
-
Gated Recurrent Unit (GRU) RNN cell. Given the inputs and previous states, it computes the outputs and updates states.
The formula for GRU used is as follows:
rt=σ(Wirxt+bir+Whrht−1+bhr)zt=σ(Wizxt+biz+Whzht−1+bhz)˜ht=tanh(Wicxt+bic+rt∗(Whcht−1+bhc))ht=zt∗ht−1+(1−zt)∗˜htyt=htwhere σ is the sigmoid function, and * is the elementwise multiplication operator.
Please refer to An Empirical Exploration of Recurrent Network Architectures for more details.
- Parameters
-
input_size (int) – The input size.
hidden_size (int) – The hidden size.
weight_ih_attr (ParamAttr|None, optional) – The parameter attribute for weight_ih. Default: None.
weight_hh_attr (ParamAttr|None, optional) – The parameter attribute for weight_hh. Default: None.
bias_ih_attr (ParamAttr|None, optional) – The parameter attribute for the bias_ih. Default: None.
bias_hh_attr (ParamAttr|None, optional) – The parameter attribute for the bias_hh. Default: None.
name (str|None, optional) – Name for the operation (optional, default is None). For more information, please refer to Name.
- Variables:
-
weight_ih (Parameter): shape (3 * hidden_size, input_size), input to hidden weight, which corresponds to the concatenation of Wir,Wiz,Wic in the formula.
weight_hh (Parameter): shape (3 * hidden_size, hidden_size), hidden to hidden weight, which corresponds to the concatenation of Whr,Whz,Whc in the formula.
bias_ih (Parameter): shape (3 * hidden_size, ), input to hidden bias, which corresponds to the concatenation of bir,biz,bic in the formula.
bias_hh (Parameter): shape (3 * hidden_size, ), hidden to hidden bias, which corresponds to the concatenation of bhr,bhz,bhc in the formula.
- Inputs:
-
inputs (Tensor): A tensor with shape [batch_size, input_size], corresponding to xt in the formula.
states (Tensor): A tensor with shape [batch_size, hidden_size], corresponding to ht−1 in the formula.
- Returns
-
shape [batch_size, hidden_size], the output, corresponding to ht in the formula. - states (Tensor): shape [batch_size, hidden_size], the new hidden state, corresponding to ht in the formula.
- Return type
-
outputs (Tensor)
Notes
All the weights and bias are initialized with Uniform(-std, std) by default. Where std = 1√hidden_size. For more information about parameter initialization, please refer to s:ref:api_paddle_ParamAttr.
Examples
>>> import paddle >>> x = paddle.randn((4, 16)) >>> prev_h = paddle.randn((4, 32)) >>> cell = paddle.nn.GRUCell(16, 32) >>> y, h = cell(x, prev_h) >>> print(y.shape) [4, 32] >>> print(h.shape) [4, 32]
-
forward
(
inputs: Tensor,
states: Tensor | None = None
)
tuple[Tensor, Tensor]
forward¶
-
Defines the computation performed at every call. Should be overridden by all subclasses.
- Parameters
-
*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments
- property state_shape : tuple[int]
-
The state_shape of GRUCell is a shape [hidden_size] (-1 for batch size would be automatically inserted into shape). The shape corresponds to the shape of ht−1.
-
extra_repr
(
)
str
extra_repr¶
-
Extra representation of this layer, you can have custom implementation of your own layer.