LRScheduler¶
- class paddle.optimizer.lr. LRScheduler ( learning_rate=0.1, last_epoch=- 1, verbose=False ) [source]
-
LRScheduler Base class. Define the common interface of a learning rate scheduler.
There are currently 17 strategies implemented in paddle based on this base class, which are:
NoamDecay
: Related algorithms are derived from *Attention Is All You Need* . Please refer to NoamDecay.ExponentialDecay
: The next learning rate is obtained by multiplying the current learning rate by a given decay rate. Please refer to ExponentialDecay.NaturalExpDecay
: Each time the current learning rate is multiplied by the natural index of the given decay rate to obtain the next learning rate. Please refer to NaturalExpDecay.InverseTimeDecay
: The resulting learning rate is inversely proportional to the current number of decays. Please refer to InverseTimeDecay.PolynomialDecay
: The resulting learning rate is the interpolation of the score points between the initial learning rate and the given final learning determined by polynomial computation weights. Please refer to PolynomialDecay.PiecewiseDecay
: Segments decay in a step-like fashion by a given number of steps, and each segment has the same learning rate. Please refer to PiecewiseDecay.CosineAnnealingDecay
: The learning rate varies periodically with the number of steps as a cosine function. Please refer to CosineAnnealingDecay.LinearWarmup
: The learning rate increases linearly with the number of steps to the specified learning rate. Please refer to LinearWarmup.StepDecay
: The learning rate decays every fixed interval number of steps, and the number of step intervals needs to be specified. Please refer to StepDecay.MultiStepDecay
: The learning rate decays at a specific number of steps, and the node location at which the decay occurs needs to be specified. Please refer to MultiStepDecay.LambdaDecay
: The learning rate decays according to a custom lambda function. Please refer to LambdaDecay.ReduceOnPlateau
: The learning rate is adaptively adjusted according to the current metric (typically loss), and the learning rate is attenuated when the loss becomes stable. Please refer to ReduceOnPlateau.MultiplicativeDecay
: The resulting learning rate is obtained by multiplying the current learning rate each time by a lambda function. Please refer to MultiplicativeDecay.OneCycleLR
: The learning rate goes up to the maximum and then down to the minimum. Please refer to OneCycleLR.CyclicLR
: Think of the process of learning rate change as a cycle, with the learning rate changing between the minimum and maximum learning rates according to a fixed frequency. Please refer to CyclicLR.LinearLR
: The learning rate increases linearly with the number of steps to the specified learning rate. Please refer to LinearLR.CosineAnnealingWarmRestarts
: The learning rate varies periodically with the number of steps as a cosine function. Please refer to CosineAnnealingWarmRestarts.
User can import it by
from paddle.optimizer.lr import LRScheduler
,then overload it for your subclass and have a custom implementation of
get_lr()
.Otherwise, an
NotImplementedError
exception will be thrown.- Parameters
-
learning_rate (float) – The initial learning rate. It is a python float number.
last_epoch (int, optional) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool, optional) – If
True
, prints a message to stdout for each update. Default:False
.
- Returns
-
instance to schedule learning rate.
Examples
Here is an example of a simple
StepDecay
implementation.>>> import paddle >>> from paddle.optimizer.lr import LRScheduler >>> class StepDecay(LRScheduler): ... def __init__(self, ... learning_rate, ... step_size, ... gamma=0.1, ... last_epoch=-1, ... verbose=False): ... if not isinstance(step_size, int): ... raise TypeError( ... "The type of 'step_size' must be 'int', but received %s." % ... type(step_size)) ... if gamma >= 1.0: ... raise ValueError('gamma should be < 1.0.') ... ... self.step_size = step_size ... self.gamma = gamma ... super().__init__(learning_rate, last_epoch, verbose) ... ... def get_lr(self): ... i = self.last_epoch // self.step_size ... return self.base_lr * (self.gamma**i) ...
-
step
(
epoch=None
)
step¶
-
step
should be called afteroptimizer.step
. It will update the learning rate in optimizer according to currentepoch
. The new learning rate will take effect on nextoptimizer.step
.- Parameters
-
epoch (int, None) – specify current epoch. Default: None. Auto-increment from last_epoch=-1.
- Returns
-
None
Examples
>>> import paddle >>> value = paddle.arange(26, dtype='float32') >>> a = paddle.reshape(value, [2, 13]) >>> linear = paddle.nn.Linear(13, 5) >>> adadelta = paddle.optimizer.Adadelta(learning_rate=0.0003, epsilon=1e-06, rho=0.95, ... parameters = linear.parameters()) >>> out = linear(a) >>> out.backward() >>> adadelta.step() >>> adadelta.clear_grad()
>>> import paddle >>> value = paddle.arange(26, dtype='float32') >>> a = paddle.reshape(value, [2, 13]) >>> linear = paddle.nn.Linear(13, 5) >>> adadelta = paddle.optimizer.Adadelta(learning_rate=0.0003, epsilon=1e-06, rho=0.95, ... parameters = linear.parameters()) >>> out = linear(a) >>> out.backward() >>> adadelta.step() >>> adadelta.clear_grad()
-
state_dict
(
)
state_dict¶
-
Returns the state of the scheduler as a
dict
.It is a subset of
self.__dict__
.
-
state_keys
(
)
state_keys¶
-
For those subclass who overload
LRScheduler
(Base Class). Acquiescently, “last_epoch, last_lr” will be saved byself.keys = ['last_epoch', 'last_lr']
.last_epoch
is the current epoch num, andlast_lr
is the current learning rate.If you want to change the default behavior, you should have a custom implementation of
_state_keys()
to redefineself.keys
.
-
set_state_dict
(
state_dict
)
set_state_dict¶
-
Loads the schedulers state.
-
set_dict
(
state_dict
)
set_dict¶
-
Loads the schedulers state.
-
get_lr
(
)
get_lr¶
-
For those subclass who overload
LRScheduler
(Base Class), User should have a custom implementation ofget_lr()
.Otherwise, an
NotImplementedError
exception will be thrown.