ReduceOnPlateau

class paddle.optimizer.lr. ReduceOnPlateau ( learning_rate, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, epsilon=1e-08, verbose=False ) [source]

Reduce learning rate when metrics has stopped descending. Models often benefit from reducing the learning rate by 2 to 10 times once model performance has no longer improvement.

The metrics is the one which has been pass into step , it’s shape must [] or [1]. When metrics stop descending for a patience number of epochs, the learning rate will be reduced to learning_rate * factor . (Specially, mode can also be set to 'max , in this case, when metrics stop ascending for a patience number of epochs, the learning rate will be reduced.)

In addition, After each reduction, it will wait a cooldown number of epochs before resuming above operation.

Parameters
  • learning_rate (float) – The initial learning rate. It is a python float number.

  • mode (str, optional) – 'min' or 'max' can be selected. Normally, it is 'min' , which means that the learning rate will reduce when loss stops descending. Specially, if it’s set to 'max' , the learning rate will reduce when loss stops ascending. Default: 'min' .

  • factor (float, optional) – The Ratio that the learning rate will be reduced. new_lr = origin_lr * factor . It should be less than 1.0. Default: 0.1.

  • patience (int, optional) – When loss doesn’t improve for this number of epochs, learning rate will be reduced. Default: 10.

  • threshold (float, optional) – threshold and threshold_mode will determine the minimum change of loss . This make tiny changes of loss will be ignored. Default: 1e-4.

  • threshold_mode (str, optional) – 'rel' or 'abs' can be selected. In 'rel' mode, the minimum change of loss is last_loss * threshold , where last_loss is loss in last epoch. In 'abs' mode, the minimum change of loss is threshold . Default: 'rel' .

  • cooldown (int, optional) – The number of epochs to wait before resuming normal operation. Default: 0.

  • min_lr (float, optional) – The lower bound of the learning rate after reduction. Default: 0.

  • epsilon (float, optional) – Minimal decay applied to lr. If the difference between new and old lr is smaller than epsilon, the update is ignored. Default: 1e-8.

  • verbose (bool, optional) – If True, prints a message to stdout for each update. Default: False.

Returns

ReduceOnPlateau instance to schedule learning rate.

Examples

>>> # Example1: train on default dynamic graph mode
>>> import paddle
>>> import numpy as np

>>> # train on default dynamic graph mode
>>> linear = paddle.nn.Linear(10, 10)
>>> scheduler = paddle.optimizer.lr.ReduceOnPlateau(learning_rate=1.0, factor=0.5, patience=5, verbose=True)
>>> sgd = paddle.optimizer.SGD(learning_rate=scheduler, parameters=linear.parameters())
>>> for epoch in range(20):
...     for batch_id in range(5):
...         x = paddle.uniform([10, 10])
...         out = linear(x)
...         loss = paddle.mean(out)
...         loss.backward()
...         sgd.step()
...         sgd.clear_gradients()
...         scheduler.step(loss)    # If you update learning rate each step
...     # scheduler.step(loss)        # If you update learning rate each epoch
>>> # Example2: train on static graph mode
>>> import paddle
>>> import numpy as np
>>> paddle.enable_static()
>>> main_prog = paddle.static.Program()
>>> start_prog = paddle.static.Program()
>>> with paddle.static.program_guard(main_prog, start_prog):
...     x = paddle.static.data(name='x', shape=[None, 4, 5])
...     y = paddle.static.data(name='y', shape=[None, 4, 5])
...     z = paddle.static.nn.fc(x, 100)
...     loss = paddle.mean(z)
...     scheduler = paddle.optimizer.lr.ReduceOnPlateau(learning_rate=1.0, factor=0.5, patience=5, verbose=True)
...     sgd = paddle.optimizer.SGD(learning_rate=scheduler)
...     sgd.minimize(loss)
...
>>> exe = paddle.static.Executor()
>>> exe.run(start_prog)
>>> for epoch in range(20):
...     for batch_id in range(5):
...         out = exe.run(
...             main_prog,
...             feed={
...                 'x': np.random.randn(3, 4, 5).astype('float32'),
...                 'y': np.random.randn(3, 4, 5).astype('float32')
...             },
...             fetch_list=loss.name)
...         scheduler.step(out[0])    # If you update learning rate each step
...     # scheduler.step(out[0])        # If you update learning rate each epoch
...
state_keys ( )

state_keys

For those subclass who overload LRScheduler (Base Class). Acquiescently, “last_epoch, last_lr” will be saved by self.keys = ['last_epoch', 'last_lr'] .

last_epoch is the current epoch num, and last_lr is the current learning rate.

If you want to change the default behavior, you should have a custom implementation of _state_keys() to redefine self.keys .

step ( metrics, epoch=None )

step

step should be called after optimizer.step() . It will update the learning rate in optimizer according to metrics . The new learning rate will take effect on next epoch.

Parameters
  • metrics (Tensor|numpy.ndarray|float) – Which will be monitored to determine whether the learning rate will reduce. If it stop descending for a patience number of epochs, the learning rate will reduce. If it’s ‘Tensor’ or ‘numpy.ndarray’, its numel must be 1.

  • epoch (int, None) – specify current epoch. Default: None. Auto-increment from last_epoch=-1.

Returns

None

Examples

Please refer to the example of current LRScheduler.

get_lr ( )

get_lr

For those subclass who overload LRScheduler (Base Class), User should have a custom implementation of get_lr() .

Otherwise, an NotImplementedError exception will be thrown.

set_dict ( state_dict )

set_dict

Loads the schedulers state.

set_state_dict ( state_dict )

set_state_dict

Loads the schedulers state.

state_dict ( )

state_dict

Returns the state of the scheduler as a dict.

It is a subset of self.__dict__ .