GradScaler¶
- class paddle.amp. GradScaler ( enable=True, init_loss_scaling=65536.0, incr_ratio=2.0, decr_ratio=0.5, incr_every_n_steps=2000, decr_every_n_nan_or_inf=1, use_dynamic_loss_scaling=True ) [source]
-
GradScaler is used for Auto-Mixed-Precision training in dynamic graph mode. It controls the scaling of loss, helps avoiding numerical overflow. The object of this class has nineteen methods scale(), unscale_(), minimize(), step(), update() and get/set api of parameters.
scale() is used to multiply the loss by a scale ratio. unscale_() is used to unscale the gradients of parameters, multiplies the gradients of parameters by 1/(scale ratio) minimize() is similar as optimizer.minimize(), performs parameters updating, and it will update the loss_scaling, it equal to step() + update(). step() is similar as optimizer.step(), which performs parameters updating. update is used to update the loss_scaling.
Commonly, it is used together with paddle.amp.auto_cast to achieve Auto-Mixed-Precision in dynamic graph mode.
- Parameters
-
enable (bool, optional) – Enable loss scaling or not. Default is True.
init_loss_scaling (float, optional) – The initial loss scaling factor. Default is 65536.0.
incr_ratio (float, optional) – The multiplier to use when increasing the loss scaling. Default is 2.0.
decr_ratio (float, optional) – The less-than-one-multiplier to use when decreasing the loss scaling. Default is 0.5.
incr_every_n_steps (int, optional) – Increases loss scaling every n consecutive steps with finite gradients. Default is 2000.
decr_every_n_nan_or_inf (int, optional) – Decreases loss scaling every n accumulated steps with nan or inf gradients. Default is 1.
use_dynamic_loss_scaling (bool, optional) – Whether to use dynamic loss scaling. If False, fixed loss_scaling is used. If True, the loss scaling is updated dynamicly. Default is True.
- Returns
-
An GradScaler object.
Examples
>>> import paddle >>> model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True) >>> optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters()) >>> scaler = paddle.amp.GradScaler(init_loss_scaling=1024) >>> data = paddle.rand([10, 3, 32, 32]) >>> with paddle.amp.auto_cast(): ... conv = model(data) ... loss = paddle.mean(conv) >>> scaled = scaler.scale(loss) # scale the loss >>> scaled.backward() # do backward >>> scaler.minimize(optimizer, scaled) # update parameters >>> optimizer.clear_grad()
-
scale
(
var
)
scale¶
-
Multiplies a Tensor by the scale factor and returns scaled outputs. If this instance of
GradScaler
is not enabled, output are returned unmodified.- Parameters
-
var (Tensor) – The tensor to scale.
- Returns
-
The scaled tensor or original tensor.
Examples
>>> import paddle >>> model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True) >>> optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters()) >>> scaler = paddle.amp.GradScaler(init_loss_scaling=1024) >>> data = paddle.rand([10, 3, 32, 32]) >>> with paddle.amp.auto_cast(): ... conv = model(data) ... loss = paddle.mean(conv) >>> scaled = scaler.scale(loss) # scale the loss >>> scaled.backward() # do backward >>> scaler.minimize(optimizer, scaled) # update parameters >>> optimizer.clear_grad()
-
minimize
(
optimizer,
*args,
**kwargs
)
minimize¶
-
This function is similar as optimizer.minimize(), which performs parameters updating.
If the scaled gradients of parameters contains NAN or INF, the parameters updating is skipped. Otherwise, if unscale_() has not been called, it first unscales the scaled gradients of parameters, then updates the parameters.
Finally, the loss scaling ratio is updated.
- Parameters
-
optimizer (Optimizer) – The optimizer used to update parameters.
args – Arguments, which will be forward to optimizer.minimize().
kwargs – Keyword arguments, which will be forward to optimizer.minimize().
Examples
>>> import paddle >>> model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True) >>> optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters()) >>> scaler = paddle.amp.GradScaler(init_loss_scaling=1024) >>> data = paddle.rand([10, 3, 32, 32]) >>> with paddle.amp.auto_cast(): ... conv = model(data) ... loss = paddle.mean(conv) >>> scaled = scaler.scale(loss) # scale the loss >>> scaled.backward() # do backward >>> scaler.minimize(optimizer, scaled) # update parameters >>> optimizer.clear_grad()
-
step
(
optimizer
)
step¶
-
This function is similar as optimizer.step(), which performs parameters updating.
If the scaled gradients of parameters contains NAN or INF, the parameters updating is skipped. Otherwise, if unscale_() has not been called, it first unscales the scaled gradients of parameters, then updates the parameters.
- Parameters
-
optimizer (Optimizer) – The optimizer used to update parameters.
Examples
>>> >>> import paddle >>> paddle.device.set_device('gpu') >>> model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True) >>> optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters()) >>> scaler = paddle.amp.GradScaler(init_loss_scaling=1024) >>> data = paddle.rand([10, 3, 32, 32]) >>> with paddle.amp.auto_cast(): ... conv = model(data) ... loss = paddle.mean(conv) >>> scaled = scaler.scale(loss) # scale the loss >>> scaled.backward() # do backward >>> scaler.step(optimizer) # update parameters >>> scaler.update() # update the loss scaling ratio >>> optimizer.clear_grad()
-
update
(
)
update¶
-
Updates the loss_scaling.
Examples
>>> >>> import paddle >>> paddle.device.set_device('gpu') >>> model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True) >>> optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters()) >>> scaler = paddle.amp.GradScaler(init_loss_scaling=1024) >>> data = paddle.rand([10, 3, 32, 32]) >>> with paddle.amp.auto_cast(): ... conv = model(data) ... loss = paddle.mean(conv) >>> scaled = scaler.scale(loss) # scale the loss >>> scaled.backward() # do backward >>> scaler.step(optimizer) # update parameters >>> scaler.update() # update the loss scaling ratio >>> optimizer.clear_grad()
-
unscale_
(
optimizer
)
unscale_¶
-
Unscale the gradients of parameters, multiplies the gradients of parameters by 1/(loss scaling ratio). If this instance of
GradScaler
is not enabled, output are returned unmodified.- Parameters
-
optimizer (Optimizer) – The optimizer used to update parameters.
- Returns
-
The unscaled parameters or original parameters.
Examples
>>> >>> import paddle >>> paddle.device.set_device('gpu') >>> model = paddle.nn.Conv2D(3, 2, 3, bias_attr=True) >>> optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters()) >>> scaler = paddle.amp.GradScaler(init_loss_scaling=1024) >>> data = paddle.rand([10, 3, 32, 32]) >>> with paddle.amp.auto_cast(): ... conv = model(data) ... loss = paddle.mean(conv) >>> scaled = scaler.scale(loss) # scale the loss >>> scaled.backward() # do backward >>> scaler.unscale_(optimizer) # unscale the parameter >>> scaler.step(optimizer) >>> scaler.update() >>> optimizer.clear_grad()
-
is_enable
(
)
is_enable¶
-
Enable loss scaling or not.
- Returns
-
enable loss scaling return True else return False.
- Return type
-
bool
Examples
>>> >>> import paddle >>> scaler = paddle.amp.GradScaler( ... enable=True, ... init_loss_scaling=1024, ... incr_ratio=2.0, ... decr_ratio=0.5, ... incr_every_n_steps=1000, ... decr_every_n_nan_or_inf=2, ... use_dynamic_loss_scaling=True ... ) >>> enable = scaler.is_enable() >>> print(enable) True
-
is_use_dynamic_loss_scaling
(
)
is_use_dynamic_loss_scaling¶
-
Whether to use dynamic loss scaling.
- Returns
-
if fixed loss_scaling is used return False, if the loss scaling is updated dynamicly return true.
- Return type
-
bool
Examples
>>> >>> import paddle >>> scaler = paddle.amp.GradScaler( ... enable=True, ... init_loss_scaling=1024, ... incr_ratio=2.0, ... decr_ratio=0.5, ... incr_every_n_steps=1000, ... decr_every_n_nan_or_inf=2, ... use_dynamic_loss_scaling=True ... ) >>> use_dynamic_loss_scaling = scaler.is_use_dynamic_loss_scaling() >>> print(use_dynamic_loss_scaling) True
-
get_init_loss_scaling
(
)
get_init_loss_scaling¶
-
Return the initial loss scaling factor.
- Reurns:
-
float: the initial loss scaling factor.
Examples
>>> >>> import paddle >>> scaler = paddle.amp.GradScaler( ... enable=True, ... init_loss_scaling=1024, ... incr_ratio=2.0, ... decr_ratio=0.5, ... incr_every_n_steps=1000, ... decr_every_n_nan_or_inf=2, ... use_dynamic_loss_scaling=True ... ) >>> init_loss_scaling = scaler.get_init_loss_scaling() >>> print(init_loss_scaling) 1024
-
set_init_loss_scaling
(
new_init_loss_scaling
)
set_init_loss_scaling¶
-
Set the initial loss scaling factor by new_init_loss_scaling.
- Parameters
-
new_init_loss_scaling (float) – The new_init_loss_scaling used to update initial loss scaling factor.
Examples
>>> >>> import paddle >>> scaler = paddle.amp.GradScaler( ... enable=True, ... init_loss_scaling=1024, ... incr_ratio=2.0, ... decr_ratio=0.5, ... incr_every_n_steps=1000, ... decr_every_n_nan_or_inf=2, ... use_dynamic_loss_scaling=True ... ) >>> print(scaler.get_init_loss_scaling()) 1024 >>> new_init_loss_scaling = 1000 >>> scaler.set_init_loss_scaling(new_init_loss_scaling) >>> print(scaler.get_init_loss_scaling()) 1000
-
get_incr_ratio
(
)
get_incr_ratio¶
-
Return the multiplier to use when increasing the loss scaling.
- Reurns:
-
float: the multiplier to use when increasing the loss scaling.
Examples
>>> >>> import paddle >>> scaler = paddle.amp.GradScaler( ... enable=True, ... init_loss_scaling=1024, ... incr_ratio=2.0, ... decr_ratio=0.5, ... incr_every_n_steps=1000, ... decr_every_n_nan_or_inf=2, ... use_dynamic_loss_scaling=True ... ) >>> incr_ratio = scaler.get_incr_ratio() >>> print(incr_ratio) 2.0
-
set_incr_ratio
(
new_incr_ratio
)
set_incr_ratio¶
-
Set the multiplier to use when increasing the loss scaling by new_incr_ratio, new_incr_ratio should > 1.0.
- Parameters
-
new_incr_ratio (float) – The new_incr_ratio used to update the multiplier to use when increasing the loss scaling.
Examples
>>> >>> import paddle >>> scaler = paddle.amp.GradScaler( ... enable=True, ... init_loss_scaling=1024, ... incr_ratio=2.0, ... decr_ratio=0.5, ... incr_every_n_steps=1000, ... decr_every_n_nan_or_inf=2, ... use_dynamic_loss_scaling=True ... ) >>> print(scaler.get_incr_ratio()) 2.0 >>> new_incr_ratio = 3.0 >>> scaler.set_incr_ratio(new_incr_ratio) >>> print(scaler.get_incr_ratio()) 3.0
-
get_decr_ratio
(
)
get_decr_ratio¶
-
Get the less-than-one-multiplier to use when decreasing the loss scaling.
- Reurns:
-
float: the less-than-one-multiplier to use when decreasing the loss scaling.
Examples
>>> >>> import paddle >>> scaler = paddle.amp.GradScaler( ... enable=True, ... init_loss_scaling=1024, ... incr_ratio=2.0, ... decr_ratio=0.5, ... incr_every_n_steps=1000, ... decr_every_n_nan_or_inf=2, ... use_dynamic_loss_scaling=True ... ) >>> decr_ratio = scaler.get_decr_ratio() >>> print(decr_ratio) 0.5
-
set_decr_ratio
(
new_decr_ratio
)
set_decr_ratio¶
-
Set the less-than-one-multiplier to use when decreasing the loss scaling by new_incr_ratio, new_decr_ratio should < 1.0.
- Parameters
-
new_decr_ratio (float) – The new_decr_ratio used to update the less-than-one-multiplier to use when decreasing the loss scaling.
Examples
>>> >>> import paddle >>> scaler = paddle.amp.GradScaler( ... enable=True, ... init_loss_scaling=1024, ... incr_ratio=2.0, ... decr_ratio=0.5, ... incr_every_n_steps=1000, ... decr_every_n_nan_or_inf=2, ... use_dynamic_loss_scaling=True ... ) >>> print(scaler.get_decr_ratio()) 0.5 >>> new_decr_ratio = 0.1 >>> scaler.set_decr_ratio(new_decr_ratio) >>> print(scaler.get_decr_ratio()) 0.1
-
get_incr_every_n_steps
(
)
get_incr_every_n_steps¶
-
Return the num n, n represent increases loss scaling every n consecutive steps with finite gradients.
- Reurns:
-
int: the num n, n represent increases loss scaling every n consecutive steps with finite gradients.
Examples
>>> >>> import paddle >>> scaler = paddle.amp.GradScaler( ... enable=True, ... init_loss_scaling=1024, ... incr_ratio=2.0, ... decr_ratio=0.5, ... incr_every_n_steps=1000, ... decr_every_n_nan_or_inf=2, ... use_dynamic_loss_scaling=True ... ) >>> incr_every_n_steps = scaler.get_incr_every_n_steps() >>> print(incr_every_n_steps) 1000
-
set_incr_every_n_steps
(
new_incr_every_n_steps
)
set_incr_every_n_steps¶
-
Set the num n by new_incr_every_n_steps, n represent increases loss scaling every n consecutive steps with finite gradients.
- Parameters
-
new_incr_every_n_steps (int) – The new_incr_every_n_steps used to update the num n, n represent increases loss scaling every n consecutive steps with finite gradients.
Examples
>>> >>> import paddle >>> scaler = paddle.amp.GradScaler( ... enable=True, ... init_loss_scaling=1024, ... incr_ratio=2.0, ... decr_ratio=0.5, ... incr_every_n_steps=1000, ... decr_every_n_nan_or_inf=2, ... use_dynamic_loss_scaling=True ... ) >>> print(scaler.get_incr_every_n_steps()) 1000 >>> new_incr_every_n_steps = 2000 >>> scaler.set_incr_every_n_steps(new_incr_every_n_steps) >>> print(scaler.get_incr_every_n_steps()) 2000
-
get_decr_every_n_nan_or_inf
(
)
get_decr_every_n_nan_or_inf¶
-
Return the num n, n represent decreases loss scaling every n accumulated steps with nan or inf gradients.
- Reurns:
-
int: the num n, n represent decreases loss scaling every n accumulated steps with nan or inf gradients.
Examples
>>> >>> import paddle >>> scaler = paddle.amp.GradScaler( ... enable=True, ... init_loss_scaling=1024, ... incr_ratio=2.0, ... decr_ratio=0.5, ... incr_every_n_steps=1000, ... decr_every_n_nan_or_inf=2, ... use_dynamic_loss_scaling=True ... ) >>> decr_every_n_nan_or_inf = scaler.get_decr_every_n_nan_or_inf() >>> print(decr_every_n_nan_or_inf) 2
-
set_decr_every_n_nan_or_inf
(
new_decr_every_n_nan_or_inf
)
set_decr_every_n_nan_or_inf¶
-
Set the num n by new_decr_every_n_nan_or_inf, n represent decreases loss scaling every n accumulated steps with nan or inf gradients.
- Parameters
-
new_decr_every_n_nan_or_inf (int) – The new_decr_every_n_nan_or_inf used to update the num n, n represent decreases loss scaling every n accumulated steps with nan or inf gradients.
Examples
>>> >>> import paddle >>> scaler = paddle.amp.GradScaler( ... enable=True, ... init_loss_scaling=1024, ... incr_ratio=2.0, ... decr_ratio=0.5, ... incr_every_n_steps=1000, ... decr_every_n_nan_or_inf=2, ... use_dynamic_loss_scaling=True ... ) >>> print(scaler.get_decr_every_n_nan_or_inf()) 2 >>> new_decr_every_n_nan_or_inf = 3 >>> scaler.set_decr_every_n_nan_or_inf(new_decr_every_n_nan_or_inf) >>> print(scaler.get_decr_every_n_nan_or_inf()) 3
-
state_dict
(
)
state_dict¶
-
Returns the state of the scaler as a dict, If this instance is not enabled, returns an empty dict.
- Returns
-
scale (tensor): The loss scaling factor. incr_ratio(float): The multiplier to use when increasing the loss scaling. decr_ratio(float): The less-than-one-multiplier to use when decreasing the loss scaling. incr_every_n_steps(int): Increases loss scaling every n consecutive steps with finite gradients. decr_every_n_nan_or_inf(int): Decreases loss scaling every n accumulated steps with nan or inf gradients. incr_count(int): The number of recent consecutive unskipped steps. decr_count(int): The number of recent consecutive skipped steps. use_dynamic_loss_scaling(bool): Whether to use dynamic loss scaling. If False, fixed loss_scaling is used. If True, the loss scaling is updated dynamicly. Default is True.
- Return type
-
A dict of scaler includes
Examples
>>> >>> import paddle >>> scaler = paddle.amp.GradScaler( ... enable=True, ... init_loss_scaling=1024, ... incr_ratio=2.0, ... decr_ratio=0.5, ... incr_every_n_steps=1000, ... decr_every_n_nan_or_inf=2, ... use_dynamic_loss_scaling=True ... ) >>> scaler_state = scaler.state_dict()
-
load_state_dict
(
state_dict
)
load_state_dict¶
-
Loads the scaler state.
- Parameters
-
state_dict (dict) – scaler state. Should be an object returned from a call to GradScaler.state_dict().
Examples
>>> >>> import paddle >>> scaler = paddle.amp.GradScaler( ... enable=True, ... init_loss_scaling=1024, ... incr_ratio=2.0, ... decr_ratio=0.5, ... incr_every_n_steps=1000, ... decr_every_n_nan_or_inf=2, ... use_dynamic_loss_scaling=True ... ) >>> scaler_state = scaler.state_dict() >>> scaler.load_state_dict(scaler_state)