noam_decay¶
- paddle.fluid.layers.learning_rate_scheduler. noam_decay ( d_model, warmup_steps, learning_rate=1.0 ) [source]
-
Noam decay method. The numpy implementation of noam decay as follows.
import paddle.fluid as fluid import numpy as np # set hyper parameters base_lr = 0.01 d_model = 2 current_steps = 20 warmup_steps = 200 # compute lr_value = base_lr * np.power(d_model, -0.5) * np.min([ np.power(current_steps, -0.5), np.power(warmup_steps, -1.5) * current_steps])
Please reference attention is all you need.
- Parameters
-
d_model (Variable) – The dimensionality of input and output of model.
warmup_steps (Variable) – A super parameter.
learning_rate (Variable|float|int) – The initial learning rate. If the type is Variable, it’s a tensor with shape [1], the data type can be float32 or float64. It also can be set to python int number. Default 1.0
- Returns
-
The decayed learning rate.
Examples
import paddle.fluid as fluid warmup_steps = 100 learning_rate = 0.01 lr = fluid.layers.learning_rate_scheduler.noam_decay( 1/(warmup_steps *(learning_rate ** 2)), warmup_steps, learning_rate)