Optimizer¶
Neural network in essence is a Optimization problem . With forward computing and back propagation , Optimizer
use back-propagation gradients to optimize parameters in a neural network.
1.SGD/SGDOptimizer¶
SGD
is an offspring class of Optimizer
implementing Random Gradient Descent which is a method of Gradient Descent . When it needs to train a large number of samples, we usually choose SGD
to make loss function converge more quickly.
API Reference: api_fluid_optimizer_SGDOptimizer
2.Momentum/MomentumOptimizer¶
Momentum
optimizer adds momentum on the basis of SGD
, reducing noise problem in the process of random gradient descent. You can set ues_nesterov
as False or True, respectively corresponding to traditional Momentum(Section 4.1 in thesis) algorithm and Nesterov accelerated gradient(Section 4.2 in thesis) algorithm.
API Reference: api_fluid_optimizer_MomentumOptimizer
3. Adagrad/AdagradOptimizer¶
Adagrad Optimizer can adaptively allocate different learning rates for parameters to solve the problem of different sample sizes for different parameters.
API Reference: api_fluid_optimizer_AdagradOptimizer
4.RMSPropOptimizer¶
RMSProp optimizer is a method to adaptively adjust learning rate. It mainly solves the problem of dramatic decrease of learning rate in the mid-term and end term of model training after Adagrad is used.
API Reference: api_fluid_optimizer_RMSPropOptimizer
5.Adam/AdamOptimizer¶
Optimizer of Adam is a method to adaptively adjust learning rate, fit for most non- convex optimization , big data set and high-dimensional scenarios. Adam
is the most common optimization algorithm.
API Reference: api_fluid_optimizer_AdamOptimizer
6.Adamax/AdamaxOptimizer¶
Adamax is a variant of Adam
algorithm, simplifying limits of learning rate, especially upper limit.
API Reference: api_fluid_optimizer_AdamaxOptimizer
7.DecayedAdagrad/DecayedAdagradOptimizer¶
DecayedAdagrad Optimizer can be regarded as an Adagrad
algorithm incorporated with decay rate to solve the problem of dramatic descent of learning rate in mid-term and end term of model training.
API Reference: api_fluid_optimizer_DecayedAdagrad
8. Ftrl/FtrlOptimizer¶
FtrlOptimizer Optimizer combines the high accuracy of FOBOS algorithm and the sparsity of RDA algorithm , which is an Online Learning algorithm with significantly satisfying effect.
API Reference: api_fluid_optimizer_FtrlOptimizer
9.ModelAverage¶
ModelAverage
Optimizer accumulates history parameters through sliding window during the model training. We use averaged parameters at inference time to upgrade general accuracy of inference.
API Reference: api_fluid_optimizer_ModelAverage
10.Rprop/RpropOptimizer¶
Rprop
Optimizer, this method considers that the magnitude of gradients for different weight parameters may vary greatly, making it difficult to find a global learning step size. Therefore, an innovative method is proposed to accelerate the optimization process by dynamically adjusting the learning step size through the use of parameter gradient symbols.
API Reference: api_fluid_optimizer_Rprop
11.ASGD/ASGDOptimizer¶
ASGD
Optimizer, it is a strategy version of SGD that trades space for time, and is a stochastic optimization method with trajectory averaging. On the basis of SGD, ASGD adds a measure of the average value of historical parameters, making the variance of noise in the descending direction decrease in a decreasing trend, so that the algorithm will eventually converge to the optimal value at a linear speed.
API Reference: api_fluid_optimizer_ASGD