basic_gru

paddle.fluid.contrib.layers.rnn_impl. basic_gru ( input, init_hidden, hidden_size, num_layers=1, sequence_length=None, dropout_prob=0.0, bidirectional=False, batch_first=True, param_attr=None, bias_attr=None, gate_activation=None, activation=None, dtype='float32', name='basic_gru' ) [source]

GRU implementation using basic operator, supports multiple layers and bidirectional gru.

\[ \begin{align}\begin{aligned}u_t & = actGate(W_ux xu_{t} + W_uh h_{t-1} + b_u)\\r_t & = actGate(W_rx xr_{t} + W_rh h_{t-1} + b_r)\\m_t & = actNode(W_cx xm_t + W_ch dot(r_t, h_{t-1}) + b_m)\\h_t & = dot(u_t, h_{t-1}) + dot((1-u_t), m_t)\end{aligned}\end{align} \]
Parameters
  • input (Variable) – GRU input tensor, if batch_first = False, shape should be ( seq_len x batch_size x input_size ) if batch_first = True, shape should be ( batch_size x seq_len x hidden_size )

  • init_hidden (Variable|None) – The initial hidden state of the GRU This is a tensor with shape ( num_layers x batch_size x hidden_size) if is_bidirec = True, shape should be ( num_layers*2 x batch_size x hidden_size) and can be reshaped to tensor with ( num_layers x 2 x batch_size x hidden_size) to use. If it’s None, it will be set to all 0.

  • hidden_size (int) – Hidden size of the GRU

  • num_layers (int) – The total number of layers of the GRU

  • sequence_length (Variabe|None) – A Tensor (shape [batch_size]) stores each real length of each instance, This tensor will be convert to a mask to mask the padding ids If it’s None means NO padding ids

  • dropout_prob (float|0.0) – Dropout prob, dropout ONLY works after rnn output of each layers, NOT between time steps

  • bidirectional (bool|False) – If it is bidirectional

  • batch_first (bool|True) – The shape format of the input and output tensors. If true, the shape format should be [batch_size, seq_len, hidden_size]. If false, the shape format should be [seq_len, batch_size, hidden_size]. By default this function accepts input and emits output in batch-major form to be consistent with most of data format, though a bit less efficient because of extra transposes.

  • param_attr (ParamAttr|None) – The parameter attribute for the learnable weight matrix. Note: If it is set to None or one attribute of ParamAttr, gru_unit will create ParamAttr as param_attr. If the Initializer of the param_attr is not set, the parameter is initialized with Xavier. Default: None.

  • bias_attr (ParamAttr|None) – The parameter attribute for the bias of GRU unit. If it is set to None or one attribute of ParamAttr, gru_unit will create ParamAttr as bias_attr. If the Initializer of the bias_attr is not set, the bias is initialized zero. Default: None.

  • gate_activation (function|None) – The activation function for gates (actGate). Default: ‘fluid.layers.sigmoid’

  • activation (function|None) – The activation function for cell (actNode). Default: ‘fluid.layers.tanh’

  • dtype (string) – data type used in this unit

  • name (string) – name used to identify parameters and biases

Returns

rnn_out(Tensor),last_hidden(Tensor)
  • rnn_out is result of GRU hidden, with shape (seq_len x batch_size x hidden_size) if is_bidirec set to True, shape will be ( seq_len x batch_sze x hidden_size*2)

  • last_hidden is the hidden state of the last step of GRU shape is ( num_layers x batch_size x hidden_size ) if is_bidirec set to True, shape will be ( num_layers*2 x batch_size x hidden_size), can be reshaped to a tensor with shape( num_layers x 2 x batch_size x hidden_size)

Examples

import paddle.fluid.layers as layers
from paddle.fluid.contrib.layers import basic_gru

batch_size = 20
input_size = 128
hidden_size = 256
num_layers = 2
dropout = 0.5
bidirectional = True
batch_first = False

input = layers.data( name = "input", shape = [-1, batch_size, input_size], dtype='float32')
pre_hidden = layers.data( name = "pre_hidden", shape=[-1, hidden_size], dtype='float32')
sequence_length = layers.data( name="sequence_length", shape=[-1], dtype='int32')


rnn_out, last_hidden = basic_gru( input, pre_hidden, hidden_size, num_layers = num_layers, \
        sequence_length = sequence_length, dropout_prob=dropout, bidirectional = bidirectional, \
        batch_first = batch_first)