AmpScaler¶

class paddle.fluid.dygraph.amp.loss_scaler. AmpScaler ( enable=True, init_loss_scaling=32768.0, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=1000, decr_every_n_nan_or_inf=1, use_dynamic_loss_scaling=True ) [source]

Api_attr: imperative

AmpScaler is used for Auto-Mixed-Precision training/inferring in imperative mode. It controls the scaling of loss, helps avoiding numerical overflow. The object of this class has seventeen methods scale(), unscale_(), minimize() and get/set api of parameters.

scale() is used to multiply the loss by a scale ratio. unscale_() is used to unscale the gradients of parameters, multiplies the gradients of parameters by 1/(scale ratio) minimize() is similar as optimizer.minimize(), performs parameters updating, and it will update the loss_scaling.

Commonly, it is used together with amp_guard to achieve Auto-Mixed-Precision in imperative mode.

Parameters

enable (bool, optional) – Enable loss scaling or not. Default is True.
init_loss_scaling (float, optional) – The initial loss scaling factor. Default is 2**15.
incr_ratio (float, optional) – The multiplier to use when increasing the loss scaling. Default is 2.0.
decr_ratio (float, optional) – The less-than-one-multiplier to use when decreasing the loss scaling. Default is 0.5.
incr_every_n_steps (int, optional) – Increases loss scaling every n consecutive steps with finite gradients. Default is 1000.
decr_every_n_nan_or_inf (int, optional) – Decreases loss scaling every n accumulated steps with nan or inf gradients. Default is 2.
use_dynamic_loss_scaling (bool, optional) – Whether to use dynamic loss scaling. If False, fixed loss_scaling is used. If True, the loss scaling is updated dynamicly. Default is True.

Returns

An AmpScaler object.

Examples

import numpy as np
import paddle.fluid as fluid

data = np.random.uniform(-1, 1, [10, 3, 32, 32]).astype('float32')
with fluid.dygraph.guard():
    model = fluid.dygraph.Conv2D(3, 2, 3)
    optimizer = fluid.optimizer.SGDOptimizer(
            learning_rate=0.01, parameter_list=model.parameters())
    scaler = fluid.dygraph.AmpScaler(init_loss_scaling=1024)
    data = fluid.dygraph.to_variable(data)
    with fluid.dygraph.amp_guard():
        conv = model(data)
        loss = fluid.layers.reduce_mean(conv)
        scaled = scaler.scale(loss)
        scaled.backward()
        scaler.minimize(optimizer, scaled)

scale ( var ) scale¶

Multiplies a variable(Tensor) by the scale factor and returns scaled outputs. If this instance of AmpScaler is not enabled, output are returned unmodified.

Parameters: var (Variable) – The variable to scale.
Returns: The scaled variable or original variable.

Examples

import numpy as np
import paddle.fluid as fluid

data = np.random.uniform(-1, 1, [10, 3, 32, 32]).astype('float32')
with fluid.dygraph.guard():
    model = fluid.dygraph.Conv2D(3, 2, 3)
    optimizer = fluid.optimizer.SGDOptimizer(
            learning_rate=0.01, parameter_list=model.parameters())
    scaler = fluid.dygraph.AmpScaler(init_loss_scaling=1024)
    data = fluid.dygraph.to_variable(data)
    with fluid.dygraph.amp_guard():
        conv = model(data)
        loss = fluid.layers.reduce_mean(conv)
        scaled = scaler.scale(loss)
        scaled.backward()
        scaler.minimize(optimizer, scaled)

minimize ( optimizer, *args, **kwargs )
minimize¶

This function is similar as Optimizer.minimize(), which performs parameters updating.

If the scaled gradients of parameters contains NAN or INF, the parameters updating is skipped. Otherwise, if unscale_() has not been called, it first unscales the scaled gradients of parameters, then updates the parameters.

Finally, the loss scaling ratio is updated.

Parameters

optimizer (Optimizer) – The optimizer used to update parameters.
args – Arguments, which will be forward to optimizer.minimize().
kwargs – Keyword arguments, which will be forward to Optimizer.minimize().

Examples

import numpy as np
import paddle.fluid as fluid

data = np.random.uniform(-1, 1, [10, 3, 32, 32]).astype('float32')
with fluid.dygraph.guard():
    model = fluid.dygraph.Conv2D(3, 2, 3)
    optimizer = fluid.optimizer.SGDOptimizer(
            learning_rate=0.01, parameter_list=model.parameters())
    scaler = fluid.dygraph.AmpScaler(init_loss_scaling=1024)
    data = fluid.dygraph.to_variable(data)
    with fluid.dygraph.amp_guard():
        conv = model(data)
        loss = fluid.layers.reduce_mean(conv)
        scaled = scaler.scale(loss)
        scaled.backward()
        scaler.minimize(optimizer, scaled)

is_enable ( ) is_enable¶

Enable loss scaling or not.

Returns: enable loss scaling return True else return False.
Return type: bool

is_use_dynamic_loss_scaling ( ) is_use_dynamic_loss_scaling¶

Whether to use dynamic loss scaling.

Returns: if fixed loss_scaling is used return False, if the loss scaling is updated dynamicly return true.
Return type: bool

get_init_loss_scaling ( ) get_init_loss_scaling¶

Return the initial loss scaling factor.

Reurns:: float: the initial loss scaling factor.

set_init_loss_scaling ( new_init_loss_scaling ) set_init_loss_scaling¶

Set the initial loss scaling factor by new_init_loss_scaling.

Parameters: new_init_loss_scaling (int) – The new_init_loss_scaling used to update initial loss scaling factor.s

get_incr_ratio ( ) get_incr_ratio¶

Return the multiplier to use when increasing the loss scaling.

Reurns:: float: the multiplier to use when increasing the loss scaling.

set_incr_ratio ( new_incr_ratio ) set_incr_ratio¶

Set the multiplier to use when increasing the loss scaling by new_incr_ratio, new_incr_ratio should > 1.0.

Parameters: new_incr_ratio (float) – The new_incr_ratio used to update the multiplier to use when increasing the loss scaling.

get_decr_ratio ( ) get_decr_ratio¶

Get the less-than-one-multiplier to use when decreasing the loss scaling.

Reurns:: float: the less-than-one-multiplier to use when decreasing the loss scaling.

set_decr_ratio ( new_decr_ratio ) set_decr_ratio¶

Set the less-than-one-multiplier to use when decreasing the loss scaling by new_incr_ratio, new_decr_ratio should < 1.0.

Parameters: new_decr_ratio (float) – The new_decr_ratio used to update the less-than-one-multiplier to use when decreasing the loss scaling.

get_incr_every_n_steps ( ) get_incr_every_n_steps¶

Return the num n, n represent increases loss scaling every n consecutive steps with finite gradients.

Reurns:: int: the num n, n represent increases loss scaling every n consecutive steps with finite gradients.

set_incr_every_n_steps ( new_incr_every_n_steps ) set_incr_every_n_steps¶

Set the num n by new_incr_every_n_steps, n represent increases loss scaling every n consecutive steps with finite gradients.

Parameters: new_incr_every_n_steps (int) – The new_incr_every_n_steps used to update the num n, n represent increases loss scaling every n consecutive steps with finite gradients.

get_decr_every_n_nan_or_inf ( ) get_decr_every_n_nan_or_inf¶

Return the num n, n represent decreases loss scaling every n accumulated steps with nan or inf gradients.

Reurns:: int: the num n, n represent decreases loss scaling every n accumulated steps with nan or inf gradients.

set_decr_every_n_nan_or_inf ( new_decr_every_n_nan_or_inf ) set_decr_every_n_nan_or_inf¶

Set the num n by new_decr_every_n_nan_or_inf, n represent decreases loss scaling every n accumulated steps with nan or inf gradients.

Parameters: new_decr_every_n_nan_or_inf (int) – The new_decr_every_n_nan_or_inf used to update the num n, n represent decreases loss scaling every n accumulated steps with nan or inf gradients.

state_dict ( ) state_dict¶

Returns the state of the scaler as a dict, If this instance is not enabled, returns an empty dict.

Reurns:: A dict of scaler includes: scale (tensor): The loss scaling factor. incr_ratio(float): The multiplier to use when increasing the loss scaling. decr_ratio(float): The less-than-one-multiplier to use when decreasing the loss scaling. incr_every_n_steps(int): Increases loss scaling every n consecutive steps with finite gradients. decr_every_n_nan_or_inf(int): Decreases loss scaling every n accumulated steps with nan or inf gradients. incr_count(int): The number of recent consecutive unskipped steps. decr_count(int): The number of recent consecutive skipped steps. use_dynamic_loss_scaling(bool): Whether to use dynamic loss scaling. If False, fixed loss_scaling is used. If True, the loss scaling is updated dynamicly. Default is True.

load_state_dict ( state_dict ) load_state_dict¶

Loads the scaler state.

Parameters: state_dict (dict) – scaler state. Should be an object returned from a call to AmpScaler.state_dict().