BuildStrategy¶
- class paddle.static. BuildStrategy
-
BuildStrategy allows the user to more preciously control how to build the SSA Graph in ParallelExecutor by setting the property.
- Returns
-
An BuildStrategy object.
- Return type
-
BuildStrategy
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> data = static.data(name="x", shape=[None, 1], dtype="float32") >>> hidden = static.nn.fc(data, size=10) >>> loss = paddle.mean(hidden) >>> paddle.optimizer.SGD(learning_rate=0.01).minimize(loss) >>> build_strategy = static.BuildStrategy() >>> build_strategy.enable_inplace = True >>> build_strategy.memory_optimize = True >>> build_strategy.reduce_strategy = static.BuildStrategy.ReduceStrategy.Reduce >>> program = static.CompiledProgram(static.default_main_program(), build_strategy=build_strategy)
- class GradientScaleStrategy
-
Members:
CoeffNumDevice
One
Customized
- property name
- class ReduceStrategy
-
Members:
Reduce
AllReduce
_NoReduce
- property name
- property build_cinn_pass
-
build_cinn_pass indicates whether to lowering some operators in graph into cinn ops to execute, which will speed up the process of execution. Default False.
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.build_cinn_pass = True
- Type
-
(bool, optional)
- property debug_graphviz_path
-
debug_graphviz_path indicates the path that writing the SSA Graph to file in the form of graphviz. It is useful for debugging. Default is empty string, that is, “”
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.debug_graphviz_path = "./graph"
- Type
-
(str, optional)
- property enable_auto_fusion
-
Whether to enable fusing subgraph to a fusion_group. Now we only support fusing subgraph that composed of elementwise-like operators, such as elementwise_add/mul without broadcast and activations.
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.enable_auto_fusion = True
- Type
-
(bool, optional)
- property enable_sequential_execution
-
If set True, the execution order of ops would be the same as what is in the program. Default is False.
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.enable_sequential_execution = True
- Type
-
(bool, optional)
- property fuse_adamw
-
fuse_adamw indicate whether to fuse all adamw optimizers with multi_tensor_adam, it may make the execution faster. Default is False. .. rubric:: Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.fuse_adamw = True
- Type
-
(bool, optional)
- property fuse_bn_act_ops
-
fuse_bn_act_ops indicate whether to fuse batch_norm and activation_op, it may make the execution faster. Default is False.
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.fuse_bn_act_ops = True
- Type
-
(bool, optional)
- property fuse_bn_add_act_ops
-
fuse_bn_add_act_ops indicate whether to fuse batch_norm, elementwise_add and activation_op, it may make the execution faster. Default is True
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.fuse_bn_add_act_ops = True
- Type
-
(bool, optional)
- property fuse_broadcast_ops
-
fuse_broadcast_op indicates whether to fuse the broadcast ops. Note that, in Reduce mode, fusing broadcast ops may make the program faster. Because fusing broadcast OP equals delaying the execution of all broadcast Ops, in this case, all nccl streams are used only for NCCLReduce operations for a period of time. Default False.
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.fuse_broadcast_ops = True
- Type
-
(bool, optional)
- property fuse_elewise_add_act_ops
-
fuse_elewise_add_act_ops indicate whether to fuse elementwise_add_op and activation_op, it may make the execution faster. Default is False.
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.fuse_elewise_add_act_ops = True
- Type
-
(bool, optional)
- property fuse_gemm_epilogue
-
fuse_gemm_epilogue indicate whether to fuse matmul_op, elemenewist_add_op and activation_op, it may make the execution faster. Default is False.
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.fuse_gemm_epilogue = True
- Type
-
(bool, optional)
- property fuse_relu_depthwise_conv
-
fuse_relu_depthwise_conv indicate whether to fuse relu and depthwise_conv2d, it will save GPU memory and may make the execution faster. This options is only available in GPU devices. Default is False.
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.fuse_relu_depthwise_conv = True
- Type
-
(bool, optional)
- property fused_attention
-
fused_attention indicate whether to fuse the whole multi head attention part with one op, it may make the execution faster. Default is False.
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.fused_attention = True
- Type
-
(bool, optional)
- property fused_feedforward
-
fused_feedforward indicate whether to fuse the whole feed_forward part with one op, it may make the execution faster. Default is False.
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.fused_feedforward = True
- Type
-
(bool, optional)
- property gradient_scale_strategy
-
there are three ways of defining \(loss@grad\) in ParallelExecutor, that is, CoeffNumDevice, One and Customized. By default, ParallelExecutor sets the \(loss@grad\) according to the number of devices. If you want to customize \(loss@grad\), you can choose Customized. Default is ‘CoeffNumDevice’.
Examples
>>> import numpy >>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> use_cuda = paddle.device.is_compiled_with_cuda >>> place = paddle.CUDAPlace(0) if use_cuda else paddle.CPUPlace() >>> exe = static.Executor(place) >>> data = static.data(name='X', shape=[None, 1], dtype='float32') >>> hidden = static.nn.fc(data, size=10) >>> loss = paddle.mean(hidden) >>> paddle.optimizer.SGD(learning_rate=0.01).minimize(loss) >>> exe.run(static.default_startup_program()) >>> build_strategy = static.BuildStrategy() >>> build_strategy.gradient_scale_strategy = \ ... static.BuildStrategy.GradientScaleStrategy.Customized >>> compiled_prog = static.CompiledProgram( ... static.default_main_program(), ... build_strategy=build_strategy, >>> ) >>> x = numpy.random.random(size=(10, 1)).astype('float32') >>> loss_grad = numpy.ones((1)).astype("float32") * 0.01 >>> loss_grad_name = loss.name+"@GRAD" >>> loss_data = exe.run(compiled_prog, ... feed={"X": x, loss_grad_name : loss_grad}, ... fetch_list=[loss.name, loss_grad_name])
- Type
-
(paddle.static.BuildStrategy.GradientScaleStrategy, optional)
- property memory_optimize
-
memory opitimize aims to save total memory consumption, set to True to enable it.
Default None. None means framework would choose to use or not use this strategy automatically. Currently, None means that it is enabled when GC is disabled, and disabled when GC is enabled. True means enabling and False means disabling. Default is None.
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.memory_optimize = True
- Type
-
(bool, optional)
- property reduce_strategy
-
there are two reduce strategies in ParallelExecutor, AllReduce and Reduce. If you want that all the parameters’ optimization are done on all devices independently, you should choose AllReduce; otherwise, if you choose Reduce, all the parameters’ optimization will be evenly distributed to different devices, and then broadcast the optimized parameter to other devices. Default is ‘AllReduce’.
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.reduce_strategy = static.BuildStrategy.ReduceStrategy.Reduce
- Type
-
(fluid.BuildStrategy.ReduceStrategy, optional)
- property remove_unnecessary_lock
-
If set True, some locks in GPU ops would be released and ParallelExecutor would run faster. Default is True.
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.remove_unnecessary_lock = True
- Type
-
(bool, optional)
- property sequential_run
-
sequential_run is used to let the StandaloneExecutor run ops by the order of ProgramDesc. Default is False.
- Examples:
-
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.sequential_run = True
- Type
-
(bool, optional)
- property sync_batch_norm
-
sync_batch_norm indicates whether to use synchronous batch normalization which synchronizes the mean and variance through multi-devices in training phase. Current implementation doesn’t support FP16 training and CPU. And only synchronous on one machine, not all machines. Default is False.
Examples
>>> import paddle >>> import paddle.static as static >>> paddle.enable_static() >>> build_strategy = static.BuildStrategy() >>> build_strategy.sync_batch_norm = True
- Type
-
(bool, optional)