CyclicLR

class paddle.optimizer.lr. CyclicLR ( base_learning_rate: float, max_learning_rate: float, step_size_up: int, step_size_down: Optional[int] = None, mode: Literal['triangular', 'triangular2', 'exp_range'] = 'triangular', exp_gamma: float = 1.0, scale_fn: Optional[Callable[[float], float]] = None, scale_mode: Literal['cycle', 'iterations'] = 'cycle', last_epoch: int = -1, verbose: bool = False ) [source]

Set the learning rate according to the cyclic learning rate (CLR) scheduler. The scheduler regards the process of learning rate adjustment as one cycle after another. It cycles the learning rate between two boundaries with a constant frequency. The distance between the two boundaries can be scaled on a per-iteration or per-cycle basis.

It has been proposed in Cyclic Learning Rates for Training Neural Networks.

According to the paper, the cyclic learning rate schedule has three built-in scale methods:

“triangular”: A basic triangular cycle without any amplitude scaling.
“triangular2”: A basic triangular cycle that reduce initial amplitude by half each cycle.
“exp_range”: A cycle that scales initial amplitude by scale function which is defined as \(gamma^{iterations}\) .

The initial amplitude is defined as max_learning_rate - base_learning_rate. Also note that you should update learning rate each step.

Parameters

base_learning_rate (float) – Initial learning rate, which is the lower boundary in the cycle. The paper recommends that set the base_learning_rate to 1/3 or 1/4 of max_learning_rate.
max_learning_rate (float) – Maximum learning rate in the cycle. It defines the cycle amplitude as above. Since there is some scaling operation during process of learning rate adjustment, max_learning_rate may not actually be reached.
step_size_up (int) – Number of training steps, which is used to increase learning rate in a cycle. The step size of one cycle will be defined by step_size_up + step_size_down. According to the paper, step size should be set as at least 3 or 4 times steps in one epoch.
step_size_down (int, optional) – Number of training steps, which is used to decrease learning rate in a cycle. If not specified, it’s value will initialize to `` step_size_up `` . Default: None
mode (str, optional) – one of ‘triangular’, ‘triangular2’ or ‘exp_range’. If scale_fn is specified, this argument will be ignored. Default: ‘triangular’
exp_gamma (float) – Constant in ‘exp_range’ scaling function: exp_gamma**iterations. Used only when mode = ‘exp_range’. Default: 1.0
scale_fn (function, optional) – A custom scaling function, which is used to replace three built-in methods. It should only have one argument. For all x >= 0, 0 <= scale_fn(x) <= 1. If specified, then ‘mode’ will be ignored. Default: None
scale_mode (str, optional) – One of ‘cycle’ or ‘iterations’. Defines whether scale_fn is evaluated on cycle number or cycle iterations (total iterations since start of training). Default: ‘cycle’
last_epoch (int, optional) – The index of last epoch. Can be set to restart training.Default: -1, means initial learning rate.
verbose – (bool, optional): If True, prints a message to stdout for each update. Default: False .

Returns

CyclicLR instance to schedule learning rate.

Examples

>>> # Example1: train on default dynamic graph mode
>>> import paddle
>>> import numpy as np

>>> # train on default dynamic graph mode
>>> linear = paddle.nn.Linear(10, 10)
>>> scheduler = paddle.optimizer.lr.CyclicLR(base_learning_rate=0.5, max_learning_rate=1.0, step_size_up=15, step_size_down=5, verbose=True)
>>> sgd = paddle.optimizer.SGD(learning_rate=scheduler, parameters=linear.parameters())
>>> for epoch in range(5):
...     for batch_id in range(20):
...         x = paddle.uniform([10, 10])
...         out = linear(x)
...         loss = paddle.mean(out)
...         loss.backward()
...         sgd.step()
...         sgd.clear_gradients()
...         scheduler.step()        # You should update learning rate each step

>>> # Example2: train on static graph mode
>>> import paddle
>>> import numpy as np
>>> paddle.enable_static()
>>> main_prog = paddle.static.Program()
>>> start_prog = paddle.static.Program()
>>> with paddle.static.program_guard(main_prog, start_prog):
...     x = paddle.static.data(name='x', shape=[None, 4, 5])
...     y = paddle.static.data(name='y', shape=[None, 4, 5])
...     z = paddle.static.nn.fc(x, 100)
...     loss = paddle.mean(z)
...     scheduler = paddle.optimizer.lr.CyclicLR(base_learning_rate=0.5,
...         max_learning_rate=1.0, step_size_up=15, step_size_down=5, verbose=True)
...     sgd = paddle.optimizer.SGD(learning_rate=scheduler)
...     sgd.minimize(loss)
...
>>> exe = paddle.static.Executor()
>>> exe.run(start_prog)
>>> for epoch in range(5):
...     for batch_id in range(20):
...         out = exe.run(
...             main_prog,
...             feed={
...                 'x': np.random.randn(3, 4, 5).astype('float32'),
...                 'y': np.random.randn(3, 4, 5).astype('float32')
...             },
...             fetch_list=loss.name)
...         scheduler.step()    # You should update learning rate each step

set_dict ( state_dict: _LRStateDict ) → None set_dict¶: Loads the schedulers state.

set_state_dict ( state_dict: _LRStateDict ) → None set_state_dict¶: Loads the schedulers state.

state_dict ( ) → _LRStateDict state_dict¶

Returns the state of the scheduler as a dict.

It is a subset of self.__dict__ .

state_keys ( ) → None state_keys¶

For those subclass who overload LRScheduler (Base Class). Acquiescently, “last_epoch, last_lr” will be saved by self.keys = ['last_epoch', 'last_lr'] .

last_epoch is the current epoch num, and last_lr is the current learning rate.

If you want to change the default behavior, you should have a custom implementation of _state_keys() to redefine self.keys .

step ( epoch: Optional[int] = None ) → None step¶

step should be called after optimizer.step . It will update the learning rate in optimizer according to current epoch . The new learning rate will take effect on next optimizer.step .

Parameters: epoch (int, None) – specify current epoch. Default: None. Auto-increment from last_epoch=-1.
Returns: None

Examples

>>> import paddle
>>> value = paddle.arange(26, dtype='float32')
>>> a = paddle.reshape(value, [2, 13])
>>> linear = paddle.nn.Linear(13, 5)
>>> adadelta = paddle.optimizer.Adadelta(learning_rate=0.0003, epsilon=1e-06, rho=0.95,
...                             parameters = linear.parameters())
>>> out = linear(a)
>>> out.backward()
>>> adadelta.step()
>>> adadelta.clear_grad()

>>> import paddle
>>> value = paddle.arange(26, dtype='float32')
>>> a = paddle.reshape(value, [2, 13])
>>> linear = paddle.nn.Linear(13, 5)
>>> adadelta = paddle.optimizer.Adadelta(learning_rate=0.0003, epsilon=1e-06, rho=0.95,
...                             parameters = linear.parameters())
>>> out = linear(a)
>>> out.backward()
>>> adadelta.step()
>>> adadelta.clear_grad()

get_lr ( ) → float get_lr¶

For those subclass who overload LRScheduler (Base Class), User should have a custom implementation of get_lr() .

Otherwise, an NotImplementedError exception will be thrown.

CyclicLR

set_dict¶

set_state_dict¶

state_dict¶

state_keys¶

step¶

get_lr¶