GradScaler

class paddle.amp. GradScaler ( enable=True, init_loss_scaling=65536.0, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=2000, decr_every_n_nan_or_inf=1, use_dynamic_loss_scaling=True ) [source]

GradScaler is used for Auto-Mixed-Precision training in dynamic graph mode. It controls the scaling of loss, helps avoiding numerical overflow. The object of this class has nineteen methods scale(), unscale_(), minimize(), step(), update() and get/set api of parameters.

scale() is used to multiply the loss by a scale ratio. unscale_() is used to unscale the gradients of parameters, multiplies the gradients of parameters by 1/(scale ratio) minimize() is similar as optimizer.minimize(), performs parameters updating, and it will update the loss_scaling, it equal to step() + update(). step() is similar as optimizer.step(), which performs parameters updating. update is used to update the loss_scaling.

Commonly, it is used together with paddle.amp.auto_cast to achieve Auto-Mixed-Precision in dynamic graph mode.

Parameters
  • enable (bool, optional) – Enable loss scaling or not. Default is True.

  • init_loss_scaling (float, optional) – The initial loss scaling factor. Default is 65536.0.

  • incr_ratio (float, optional) – The multiplier to use when increasing the loss scaling. Default is 2.0.

  • decr_ratio (float, optional) – The less-than-one-multiplier to use when decreasing the loss scaling. Default is 0.5.

  • incr_every_n_steps (int, optional) – Increases loss scaling every n consecutive steps with finite gradients. Default is 2000.

  • decr_every_n_nan_or_inf (int, optional) – Decreases loss scaling every n accumulated steps with nan or inf gradients. Default is 1.

  • use_dynamic_loss_scaling (bool, optional) – Whether to use dynamic loss scaling. If False, fixed loss_scaling is used. If True, the loss scaling is updated dynamically. Default is True.

Returns

An GradScaler object.

Examples

>>> import paddle

>>> model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True)
>>> optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters())
>>> scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
>>> data = paddle.rand([10, 3, 32, 32])

>>> with paddle.amp.auto_cast():
...     conv = model(data)
...     loss = paddle.mean(conv)

>>> scaled = scaler.scale(loss)  # scale the loss
>>> scaled.backward()            # do backward
>>> scaler.minimize(optimizer, scaled)  # update parameters
>>> optimizer.clear_grad()
scale ( var )

scale

Multiplies a Tensor by the scale factor and returns scaled outputs. If this instance of GradScaler is not enabled, output are returned unmodified.

Parameters

var (Tensor) – The tensor to scale.

Returns

The scaled tensor or original tensor.

Examples

>>> import paddle

>>> model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True)
>>> optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters())
>>> scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
>>> data = paddle.rand([10, 3, 32, 32])

>>> with paddle.amp.auto_cast():
...     conv = model(data)
...     loss = paddle.mean(conv)

>>> scaled = scaler.scale(loss)  # scale the loss
>>> scaled.backward()            # do backward
>>> scaler.minimize(optimizer, scaled)  # update parameters
>>> optimizer.clear_grad()
minimize ( optimizer, *args, **kwargs )

minimize

This function is similar as optimizer.minimize(), which performs parameters updating.

If the scaled gradients of parameters contains NAN or INF, the parameters updating is skipped. Otherwise, if unscale_() has not been called, it first unscales the scaled gradients of parameters, then updates the parameters.

Finally, the loss scaling ratio is updated.

Parameters
  • optimizer (Optimizer) – The optimizer used to update parameters.

  • args – Arguments, which will be forward to optimizer.minimize().

  • kwargs – Keyword arguments, which will be forward to optimizer.minimize().

Examples

>>> import paddle

>>> model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True)
>>> optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters())
>>> scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
>>> data = paddle.rand([10, 3, 32, 32])

>>> with paddle.amp.auto_cast():
...     conv = model(data)
...     loss = paddle.mean(conv)

>>> scaled = scaler.scale(loss)  # scale the loss
>>> scaled.backward()            # do backward
>>> scaler.minimize(optimizer, scaled)  # update parameters
>>> optimizer.clear_grad()
step ( optimizer )

step

This function is similar as optimizer.step(), which performs parameters updating.

If the scaled gradients of parameters contains NAN or INF, the parameters updating is skipped. Otherwise, if unscale_() has not been called, it first unscales the scaled gradients of parameters, then updates the parameters.

Parameters

optimizer (Optimizer) – The optimizer used to update parameters.

Examples

>>> 
>>> import paddle
>>> paddle.device.set_device('gpu')

>>> model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True)
>>> optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters())
>>> scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
>>> data = paddle.rand([10, 3, 32, 32])
>>> with paddle.amp.auto_cast():
...     conv = model(data)
...     loss = paddle.mean(conv)
>>> scaled = scaler.scale(loss)  # scale the loss
>>> scaled.backward()            # do backward
>>> scaler.step(optimizer)       # update parameters
>>> scaler.update()              # update the loss scaling ratio
>>> optimizer.clear_grad()
update ( )

update

Updates the loss_scaling.

Examples

>>> 
>>> import paddle

>>> paddle.device.set_device('gpu')
>>> model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True)
>>> optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters())
>>> scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
>>> data = paddle.rand([10, 3, 32, 32])
>>> with paddle.amp.auto_cast():
...     conv = model(data)
...     loss = paddle.mean(conv)
>>> scaled = scaler.scale(loss)     # scale the loss
>>> scaled.backward()               # do backward
>>> scaler.step(optimizer)          # update parameters
>>> scaler.update()                 # update the loss scaling ratio
>>> optimizer.clear_grad()
unscale_ ( optimizer )

unscale_

Unscale the gradients of parameters, multiplies the gradients of parameters by 1/(loss scaling ratio). If this instance of GradScaler is not enabled, output are returned unmodified.

Parameters

optimizer (Optimizer) – The optimizer used to update parameters.

Returns

The unscaled parameters or original parameters.

Examples

>>> 
>>> import paddle

>>> paddle.device.set_device('gpu')
>>> model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True)
>>> optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters())
>>> scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
>>> data = paddle.rand([10, 3, 32, 32])
>>> with paddle.amp.auto_cast():
...     conv = model(data)
...     loss = paddle.mean(conv)
>>> scaled = scaler.scale(loss)  # scale the loss
>>> scaled.backward()            # do backward
>>> scaler.unscale_(optimizer)    # unscale the parameter
>>> scaler.step(optimizer)
>>> scaler.update()
>>> optimizer.clear_grad()
is_enable ( )

is_enable

Enable loss scaling or not.

Returns

enable loss scaling return True else return False.

Return type

bool

Examples

>>> 
>>> import paddle
>>> scaler = paddle.amp.GradScaler(
...     enable=True,
...     init_loss_scaling=1024,
...     incr_ratio=2.0,
...     decr_ratio=0.5,
...     incr_every_n_steps=1000,
...     decr_every_n_nan_or_inf=2,
...     use_dynamic_loss_scaling=True
... )
>>> enable = scaler.is_enable()
>>> print(enable)
True
is_use_dynamic_loss_scaling ( )

is_use_dynamic_loss_scaling

Whether to use dynamic loss scaling.

Returns

if fixed loss_scaling is used return False, if the loss scaling is updated dynamically return true.

Return type

bool

Examples

>>> 
>>> import paddle
>>> scaler = paddle.amp.GradScaler(
...     enable=True,
...     init_loss_scaling=1024,
...     incr_ratio=2.0,
...     decr_ratio=0.5,
...     incr_every_n_steps=1000,
...     decr_every_n_nan_or_inf=2,
...     use_dynamic_loss_scaling=True
... )
>>> use_dynamic_loss_scaling = scaler.is_use_dynamic_loss_scaling()
>>> print(use_dynamic_loss_scaling)
True
get_init_loss_scaling ( )

get_init_loss_scaling

Return the initial loss scaling factor.

Returns

the initial loss scaling factor.

Return type

float

Examples

>>> 
>>> import paddle
>>> scaler = paddle.amp.GradScaler(
...     enable=True,
...     init_loss_scaling=1024,
...     incr_ratio=2.0,
...     decr_ratio=0.5,
...     incr_every_n_steps=1000,
...     decr_every_n_nan_or_inf=2,
...     use_dynamic_loss_scaling=True
... )
>>> init_loss_scaling = scaler.get_init_loss_scaling()
>>> print(init_loss_scaling)
1024
set_init_loss_scaling ( new_init_loss_scaling )

set_init_loss_scaling

Set the initial loss scaling factor by new_init_loss_scaling.

Parameters

new_init_loss_scaling (float) – The new_init_loss_scaling used to update initial loss scaling factor.

Examples

>>> 
>>> import paddle
>>> scaler = paddle.amp.GradScaler(
...     enable=True,
...     init_loss_scaling=1024,
...     incr_ratio=2.0,
...     decr_ratio=0.5,
...     incr_every_n_steps=1000,
...     decr_every_n_nan_or_inf=2,
...     use_dynamic_loss_scaling=True
... )
>>> print(scaler.get_init_loss_scaling())
1024
>>> new_init_loss_scaling = 1000
>>> scaler.set_init_loss_scaling(new_init_loss_scaling)
>>> print(scaler.get_init_loss_scaling())
1000
get_incr_ratio ( )

get_incr_ratio

Return the multiplier to use when increasing the loss scaling.

Returns

the multiplier to use when increasing the loss scaling.

Return type

float

Examples

>>> 
>>> import paddle
>>> scaler = paddle.amp.GradScaler(
...     enable=True,
...     init_loss_scaling=1024,
...     incr_ratio=2.0,
...     decr_ratio=0.5,
...     incr_every_n_steps=1000,
...     decr_every_n_nan_or_inf=2,
...     use_dynamic_loss_scaling=True
... )
>>> incr_ratio = scaler.get_incr_ratio()
>>> print(incr_ratio)
2.0
set_incr_ratio ( new_incr_ratio )

set_incr_ratio

Set the multiplier to use when increasing the loss scaling by new_incr_ratio, new_incr_ratio should > 1.0.

Parameters

new_incr_ratio (float) – The new_incr_ratio used to update the multiplier to use when increasing the loss scaling.

Examples

>>> 
>>> import paddle
>>> scaler = paddle.amp.GradScaler(
...     enable=True,
...     init_loss_scaling=1024,
...     incr_ratio=2.0,
...     decr_ratio=0.5,
...     incr_every_n_steps=1000,
...     decr_every_n_nan_or_inf=2,
...     use_dynamic_loss_scaling=True
... )
>>> print(scaler.get_incr_ratio())
2.0
>>> new_incr_ratio = 3.0
>>> scaler.set_incr_ratio(new_incr_ratio)
>>> print(scaler.get_incr_ratio())
3.0
get_decr_ratio ( )

get_decr_ratio

Get the less-than-one-multiplier to use when decreasing the loss scaling.

Returns

the less-than-one-multiplier to use when decreasing the loss scaling.

Return type

float

Examples

>>> 
>>> import paddle
>>> scaler = paddle.amp.GradScaler(
...     enable=True,
...     init_loss_scaling=1024,
...     incr_ratio=2.0,
...     decr_ratio=0.5,
...     incr_every_n_steps=1000,
...     decr_every_n_nan_or_inf=2,
...     use_dynamic_loss_scaling=True
... )
>>> decr_ratio = scaler.get_decr_ratio()
>>> print(decr_ratio)
0.5
set_decr_ratio ( new_decr_ratio )

set_decr_ratio

Set the less-than-one-multiplier to use when decreasing the loss scaling by new_incr_ratio, new_decr_ratio should < 1.0.

Parameters

new_decr_ratio (float) – The new_decr_ratio used to update the less-than-one-multiplier to use when decreasing the loss scaling.

Examples

>>> 
>>> import paddle
>>> scaler = paddle.amp.GradScaler(
...     enable=True,
...     init_loss_scaling=1024,
...     incr_ratio=2.0,
...     decr_ratio=0.5,
...     incr_every_n_steps=1000,
...     decr_every_n_nan_or_inf=2,
...     use_dynamic_loss_scaling=True
... )
>>> print(scaler.get_decr_ratio())
0.5
>>> new_decr_ratio = 0.1
>>> scaler.set_decr_ratio(new_decr_ratio)
>>> print(scaler.get_decr_ratio())
0.1
get_incr_every_n_steps ( )

get_incr_every_n_steps

Return the num n, n represent increases loss scaling every n consecutive steps with finite gradients.

Returns

the num n, n represent increases loss scaling every n consecutive steps with finite gradients.

Return type

int

Examples

>>> 
>>> import paddle
>>> scaler = paddle.amp.GradScaler(
...     enable=True,
...     init_loss_scaling=1024,
...     incr_ratio=2.0,
...     decr_ratio=0.5,
...     incr_every_n_steps=1000,
...     decr_every_n_nan_or_inf=2,
...     use_dynamic_loss_scaling=True
... )
>>> incr_every_n_steps = scaler.get_incr_every_n_steps()
>>> print(incr_every_n_steps)
1000
set_incr_every_n_steps ( new_incr_every_n_steps )

set_incr_every_n_steps

Set the num n by new_incr_every_n_steps, n represent increases loss scaling every n consecutive steps with finite gradients.

Parameters

new_incr_every_n_steps (int) – The new_incr_every_n_steps used to update the num n, n represent increases loss scaling every n consecutive steps with finite gradients.

Examples

>>> 
>>> import paddle
>>> scaler = paddle.amp.GradScaler(
...     enable=True,
...     init_loss_scaling=1024,
...     incr_ratio=2.0,
...     decr_ratio=0.5,
...     incr_every_n_steps=1000,
...     decr_every_n_nan_or_inf=2,
...     use_dynamic_loss_scaling=True
... )
>>> print(scaler.get_incr_every_n_steps())
1000
>>> new_incr_every_n_steps = 2000
>>> scaler.set_incr_every_n_steps(new_incr_every_n_steps)
>>> print(scaler.get_incr_every_n_steps())
2000
get_decr_every_n_nan_or_inf ( )

get_decr_every_n_nan_or_inf

Return the num n, n represent decreases loss scaling every n accumulated steps with nan or inf gradients.

Returns

the num n, n represent decreases loss scaling every n accumulated steps with nan or inf gradients.

Return type

int

Examples

>>> 
>>> import paddle
>>> scaler = paddle.amp.GradScaler(
...     enable=True,
...     init_loss_scaling=1024,
...     incr_ratio=2.0,
...     decr_ratio=0.5,
...     incr_every_n_steps=1000,
...     decr_every_n_nan_or_inf=2,
...     use_dynamic_loss_scaling=True
... )
>>> decr_every_n_nan_or_inf = scaler.get_decr_every_n_nan_or_inf()
>>> print(decr_every_n_nan_or_inf)
2
set_decr_every_n_nan_or_inf ( new_decr_every_n_nan_or_inf )

set_decr_every_n_nan_or_inf

Set the num n by new_decr_every_n_nan_or_inf, n represent decreases loss scaling every n accumulated steps with nan or inf gradients.

Parameters

new_decr_every_n_nan_or_inf (int) – The new_decr_every_n_nan_or_inf used to update the num n, n represent decreases loss scaling every n accumulated steps with nan or inf gradients.

Examples

>>> 
>>> import paddle
>>> scaler = paddle.amp.GradScaler(
...     enable=True,
...     init_loss_scaling=1024,
...     incr_ratio=2.0,
...     decr_ratio=0.5,
...     incr_every_n_steps=1000,
...     decr_every_n_nan_or_inf=2,
...     use_dynamic_loss_scaling=True
... )
>>> print(scaler.get_decr_every_n_nan_or_inf())
2
>>> new_decr_every_n_nan_or_inf = 3
>>> scaler.set_decr_every_n_nan_or_inf(new_decr_every_n_nan_or_inf)
>>> print(scaler.get_decr_every_n_nan_or_inf())
3
state_dict ( )

state_dict

Returns the state of the scaler as a dict, If this instance is not enabled, returns an empty dict.

Returns

scale (tensor): The loss scaling factor. incr_ratio(float): The multiplier to use when increasing the loss scaling. decr_ratio(float): The less-than-one-multiplier to use when decreasing the loss scaling. incr_every_n_steps(int): Increases loss scaling every n consecutive steps with finite gradients. decr_every_n_nan_or_inf(int): Decreases loss scaling every n accumulated steps with nan or inf gradients. incr_count(int): The number of recent consecutive unskipped steps. decr_count(int): The number of recent consecutive skipped steps. use_dynamic_loss_scaling(bool): Whether to use dynamic loss scaling. If False, fixed loss_scaling is used. If True, the loss scaling is updated dynamically. Default is True.

Return type

A dict of scaler includes

Examples

>>> 
>>> import paddle

>>> scaler = paddle.amp.GradScaler(
...     enable=True,
...     init_loss_scaling=1024,
...     incr_ratio=2.0,
...     decr_ratio=0.5,
...     incr_every_n_steps=1000,
...     decr_every_n_nan_or_inf=2,
...     use_dynamic_loss_scaling=True
... )
>>> scaler_state = scaler.state_dict()
load_state_dict ( state_dict )

load_state_dict

Loads the scaler state.

Parameters

state_dict (dict) – scaler state. Should be an object returned from a call to GradScaler.state_dict().

Examples

>>> 
>>> import paddle

>>> scaler = paddle.amp.GradScaler(
...     enable=True,
...     init_loss_scaling=1024,
...     incr_ratio=2.0,
...     decr_ratio=0.5,
...     incr_every_n_steps=1000,
...     decr_every_n_nan_or_inf=2,
...     use_dynamic_loss_scaling=True
... )
>>> scaler_state = scaler.state_dict()
>>> scaler.load_state_dict(scaler_state)

Used in the guide/tutorials