LRScheduler

class paddle.optimizer.lr. LRScheduler ( learning_rate: float = 0.1, last_epoch: int = -1, verbose: bool = False ) [source]

LRScheduler Base class. Define the common interface of a learning rate scheduler.

There are currently 17 strategies implemented in paddle based on this base class, which are:

NoamDecay: Related algorithms are derived from *Attention Is All You Need* . Please refer to NoamDecay.
ExponentialDecay: The next learning rate is obtained by multiplying the current learning rate by a given decay rate. Please refer to ExponentialDecay.
NaturalExpDecay: Each time the current learning rate is multiplied by the natural index of the given decay rate to obtain the next learning rate. Please refer to NaturalExpDecay.
InverseTimeDecay: The resulting learning rate is inversely proportional to the current number of decays. Please refer to InverseTimeDecay.
PolynomialDecay: The resulting learning rate is the interpolation of the score points between the initial learning rate and the given final learning determined by polynomial computation weights. Please refer to PolynomialDecay.
PiecewiseDecay: Segments decay in a step-like fashion by a given number of steps, and each segment has the same learning rate. Please refer to PiecewiseDecay.
CosineAnnealingDecay: The learning rate varies periodically with the number of steps as a cosine function. Please refer to CosineAnnealingDecay.
LinearWarmup: The learning rate increases linearly with the number of steps to the specified learning rate. Please refer to LinearWarmup.
StepDecay: The learning rate decays every fixed interval number of steps, and the number of step intervals needs to be specified. Please refer to StepDecay.
MultiStepDecay: The learning rate decays at a specific number of steps, and the node location at which the decay occurs needs to be specified. Please refer to MultiStepDecay.
LambdaDecay: The learning rate decays according to a custom lambda function. Please refer to LambdaDecay.
ReduceOnPlateau: The learning rate is adaptively adjusted according to the current metric (typically loss), and the learning rate is attenuated when the loss becomes stable. Please refer to ReduceOnPlateau.
MultiplicativeDecay: The resulting learning rate is obtained by multiplying the current learning rate each time by a lambda function. Please refer to MultiplicativeDecay.
OneCycleLR: The learning rate goes up to the maximum and then down to the minimum. Please refer to OneCycleLR.
CyclicLR: Think of the process of learning rate change as a cycle, with the learning rate changing between the minimum and maximum learning rates according to a fixed frequency. Please refer to CyclicLR.
LinearLR: The learning rate increases linearly with the number of steps to the specified learning rate. Please refer to LinearLR.
CosineAnnealingWarmRestarts: The learning rate varies periodically with the number of steps as a cosine function. Please refer to CosineAnnealingWarmRestarts.

User can import it by from paddle.optimizer.lr import LRScheduler ,

then overload it for your subclass and have a custom implementation of get_lr() .

Otherwise, an NotImplementedError exception will be thrown.

Parameters

learning_rate (float) – The initial learning rate. It is a python float number.
last_epoch (int, optional) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool, optional) – If True, prints a message to stdout for each update. Default: False .

Returns

instance to schedule learning rate.

Examples

Here is an example of a simple StepDecay implementation.

>>> import paddle
>>> from paddle.optimizer.lr import LRScheduler

>>> class StepDecay(LRScheduler):
...     def __init__(self,
...                 learning_rate,
...                 step_size,
...                 gamma=0.1,
...                 last_epoch=-1,
...                 verbose=False):
...         if not isinstance(step_size, int):
...             raise TypeError(
...                 "The type of 'step_size' must be 'int', but received %s." %
...                 type(step_size))
...         if gamma >= 1.0:
...             raise ValueError('gamma should be < 1.0.')
...
...         self.step_size = step_size
...         self.gamma = gamma
...         super().__init__(learning_rate, last_epoch, verbose)
...
...     def get_lr(self):
...         i = self.last_epoch // self.step_size
...         return self.base_lr * (self.gamma**i)
...

step ( epoch: Optional[int] = None ) → None step¶

step should be called after optimizer.step . It will update the learning rate in optimizer according to current epoch . The new learning rate will take effect on next optimizer.step .

Parameters: epoch (int, None) – specify current epoch. Default: None. Auto-increment from last_epoch=-1.
Returns: None

Examples

>>> import paddle
>>> value = paddle.arange(26, dtype='float32')
>>> a = paddle.reshape(value, [2, 13])
>>> linear = paddle.nn.Linear(13, 5)
>>> adadelta = paddle.optimizer.Adadelta(learning_rate=0.0003, epsilon=1e-06, rho=0.95,
...                             parameters = linear.parameters())
>>> out = linear(a)
>>> out.backward()
>>> adadelta.step()
>>> adadelta.clear_grad()

>>> import paddle
>>> value = paddle.arange(26, dtype='float32')
>>> a = paddle.reshape(value, [2, 13])
>>> linear = paddle.nn.Linear(13, 5)
>>> adadelta = paddle.optimizer.Adadelta(learning_rate=0.0003, epsilon=1e-06, rho=0.95,
...                             parameters = linear.parameters())
>>> out = linear(a)
>>> out.backward()
>>> adadelta.step()
>>> adadelta.clear_grad()

state_dict ( ) → _LRStateDict state_dict¶

Returns the state of the scheduler as a dict.

It is a subset of self.__dict__ .

state_keys ( ) → None state_keys¶

For those subclass who overload LRScheduler (Base Class). Acquiescently, “last_epoch, last_lr” will be saved by self.keys = ['last_epoch', 'last_lr'] .

last_epoch is the current epoch num, and last_lr is the current learning rate.

If you want to change the default behavior, you should have a custom implementation of _state_keys() to redefine self.keys .

set_state_dict ( state_dict: _LRStateDict ) → None set_state_dict¶: Loads the schedulers state.

set_dict ( state_dict: _LRStateDict ) → None set_dict¶: Loads the schedulers state.

get_lr ( ) → float get_lr¶

For those subclass who overload LRScheduler (Base Class), User should have a custom implementation of get_lr() .

Otherwise, an NotImplementedError exception will be thrown.

LRScheduler

step¶

state_dict¶

state_keys¶

set_state_dict¶

set_dict¶

get_lr¶