noam_decay¶

paddle.fluid.layers.learning_rate_scheduler. noam_decay ( d_model, warmup_steps, learning_rate=1.0 ) [source]

Noam decay method. The numpy implementation of noam decay as follows.

import paddle.fluid as fluid
import numpy as np
# set hyper parameters
base_lr = 0.01
d_model = 2
current_steps = 20
warmup_steps = 200
# compute
lr_value = base_lr * np.power(d_model, -0.5) * np.min([
                        np.power(current_steps, -0.5),
                        np.power(warmup_steps, -1.5) * current_steps])

Please reference attention is all you need.

Parameters

d_model (Variable) – The dimensionality of input and output of model.
warmup_steps (Variable) – A super parameter.
learning_rate (Variable|float|int) – The initial learning rate. If the type is Variable, it’s a tensor with shape [1], the data type can be float32 or float64. It also can be set to python int number. Default 1.0

Returns

The decayed learning rate.

Examples

import paddle.fluid as fluid
warmup_steps = 100
learning_rate = 0.01
lr = fluid.layers.learning_rate_scheduler.noam_decay(
               1/(warmup_steps *(learning_rate ** 2)),
               warmup_steps,
               learning_rate)