LookaheadOptimizer

class paddle.fluid.optimizer. LookaheadOptimizer ( inner_optimizer, alpha=0.5, k=5 ) [source]
api_attr

Static Graph

This implements the Lookahead optimizer of the paper : https://arxiv.org/abs/1907.08610.

Lookahead keeps two sets of params: the fast_params and the slow_params. inner_optimizer update fast_params every training step. Lookahead updates the slow_params and fast_params every k training steps as follows:

\[ \begin{align}\begin{aligned}\begin{split}slow\_param_t &= slow\_param_{t-1} + \\alpha * (fast\_param_{t-1} - slow\_param_{t-1})\end{split}\\fast\_param_t &= slow\_param_t\end{aligned}\end{align} \]
Parameters
  • inner_optimizer (Optimizer) – The optimizer that update fast params step by step.

  • alpha (float) – The learning rate of Lookahead.

  • k (int) – The slow params is updated every k steps.

Examples

import paddle
import paddle.fluid as fluid
import numpy as np
import numpy.random as random

paddle.enable_static()

x = fluid.layers.data(name='x', shape=[2], dtype='float32')
label = fluid.layers.data(name="label", shape=[1], dtype="int64")
y = fluid.layers.fc(input=[x], size=2, act="softmax")
loss = fluid.layers.cross_entropy(input=y, label=label)
loss = fluid.layers.mean(x=loss)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
optimizer = fluid.optimizer.LookaheadOptimizer(sgd,
                                    alpha=0.5,
                                    k=5)
optimizer.minimize(loss)
main_program = fluid.default_main_program()
place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())

def train_reader(limit=5):
    for i in range(limit):
        yield random.random([2]).astype('float32'), random.random([1]).astype('int64')

feeder = fluid.DataFeeder(feed_list=[x, label], place=place)
reader = paddle.batch(paddle.reader.shuffle(train_reader, buf_size=50000),batch_size=1)

for batch_data in reader():
    exe.run(fluid.default_main_program(),
    feed=feeder.feed(batch_data))