LinearWarmup¶
- class paddle.optimizer.lr. LinearWarmup ( learning_rate, warmup_steps, start_lr, end_lr, last_epoch=- 1, verbose=False ) [source]
-
Linear learning rate warm up strategy. Update the learning rate preliminarily before the normal learning rate scheduler. For more information, please refer to Bag of Tricks for Image Classification with Convolutional Neural Networks
When epoch < warmup_steps, learning rate is updated as:
\[lr = start\_lr + (end\_lr - start\_lr) * \frac{epoch}{warmup\_steps}\]where start_lr is the initial learning rate, and end_lr is the final learning rate;
When epoch >= warmup_steps, learning rate is updated as:
\[lr = learning_rate\]where
learning_rate
is float or any subclass ofLRScheduler
.- Parameters
-
learning_rate (float|LRScheduler) – The learning rate after warm-up. It is a python float number or any subclass of
LRScheduler
.warmup_steps (int) – total steps of warm up. It must be a positive integer.
start_lr (float) – Initial learning rate of warm up.
end_lr (float) – Final learning rate of warm up.
last_epoch (int, optional) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool, optional) – If
True
, prints a message to stdout for each update. Default:False
.
- Returns
-
LinearWarmup
instance to schedule learning rate.
Examples
>>> # Example1: train on default dynamic graph mode >>> import paddle >>> import numpy as np >>> # train on default dynamic graph mode >>> linear = paddle.nn.Linear(10, 10) >>> scheduler = paddle.optimizer.lr.LinearWarmup( ... learning_rate=0.5, warmup_steps=20, start_lr=0, end_lr=0.5, verbose=True) >>> sgd = paddle.optimizer.SGD(learning_rate=scheduler, parameters=linear.parameters()) >>> for epoch in range(20): ... for batch_id in range(5): ... x = paddle.uniform([10, 10]) ... out = linear(x) ... loss = paddle.mean(out) ... loss.backward() ... sgd.step() ... sgd.clear_gradients() ... scheduler.step() # If you update learning rate each step ... # scheduler.step() # If you update learning rate each epoch
>>> # Example2: train on static graph mode >>> import paddle >>> import numpy as np >>> paddle.enable_static() >>> main_prog = paddle.static.Program() >>> start_prog = paddle.static.Program() >>> with paddle.static.program_guard(main_prog, start_prog): ... x = paddle.static.data(name='x', shape=[None, 4, 5]) ... y = paddle.static.data(name='y', shape=[None, 4, 5]) ... z = paddle.static.nn.fc(x, 100) ... loss = paddle.mean(z) ... scheduler = paddle.optimizer.lr.LinearWarmup( ... learning_rate=0.5, warmup_steps=20, start_lr=0, end_lr=0.5, verbose=True) ... sgd = paddle.optimizer.SGD(learning_rate=scheduler) ... sgd.minimize(loss) ... >>> exe = paddle.static.Executor() >>> exe.run(start_prog) >>> for epoch in range(20): ... for batch_id in range(5): ... out = exe.run( ... main_prog, ... feed={ ... 'x': np.random.randn(3, 4, 5).astype('float32'), ... 'y': np.random.randn(3, 4, 5).astype('float32') ... }, ... fetch_list=loss.name) ... scheduler.step() # If you update learning rate each step ... # scheduler.step() # If you update learning rate each epoch
-
state_dict
(
)
state_dict¶
-
Returns the state of the LinearWarmup scheduler as a
dict
.It is a subset of
self.__dict__
.
-
set_state_dict
(
state_dict
)
set_state_dict¶
-
Loads state_dict for LinearWarmup scheduler.
-
get_lr
(
)
get_lr¶
-
For those subclass who overload
LRScheduler
(Base Class), User should have a custom implementation ofget_lr()
.Otherwise, an
NotImplementedError
exception will be thrown.
-
set_dict
(
state_dict
)
set_dict¶
-
Loads the schedulers state.
-
state_keys
(
)
state_keys¶
-
For those subclass who overload
LRScheduler
(Base Class). Acquiescently, “last_epoch, last_lr” will be saved byself.keys = ['last_epoch', 'last_lr']
.last_epoch
is the current epoch num, andlast_lr
is the current learning rate.If you want to change the default behavior, you should have a custom implementation of
_state_keys()
to redefineself.keys
.
-
step
(
epoch=None
)
step¶
-
step
should be called afteroptimizer.step
. It will update the learning rate in optimizer according to currentepoch
. The new learning rate will take effect on nextoptimizer.step
.- Parameters
-
epoch (int, None) – specify current epoch. Default: None. Auto-increment from last_epoch=-1.
- Returns
-
None
Examples
>>> import paddle >>> value = paddle.arange(26, dtype='float32') >>> a = paddle.reshape(value, [2, 13]) >>> linear = paddle.nn.Linear(13, 5) >>> adadelta = paddle.optimizer.Adadelta(learning_rate=0.0003, epsilon=1e-06, rho=0.95, ... parameters = linear.parameters()) >>> out = linear(a) >>> out.backward() >>> adadelta.step() >>> adadelta.clear_grad()
>>> import paddle >>> value = paddle.arange(26, dtype='float32') >>> a = paddle.reshape(value, [2, 13]) >>> linear = paddle.nn.Linear(13, 5) >>> adadelta = paddle.optimizer.Adadelta(learning_rate=0.0003, epsilon=1e-06, rho=0.95, ... parameters = linear.parameters()) >>> out = linear(a) >>> out.backward() >>> adadelta.step() >>> adadelta.clear_grad()