ExecutionStrategy

class paddle.static. ExecutionStrategy

ExecutionStrategy allows the user to more preciously control how to run the program in ParallelExecutor by setting the property.

Returns

An ExecutionStrategy object.

Return type

ExecutionStrategy

Examples

import paddle
import paddle.static as static
import paddle.nn.functional as F

paddle.enable_static()

x = static.data(name='x', shape=[None, 13], dtype='float32')
y = static.data(name='y', shape=[None, 1], dtype='float32')
y_predict = static.nn.fc(input=x, size=1, act=None)

cost = F.square_error_cost(input=y_predict, label=y)
avg_loss = paddle.mean(cost)

sgd_optimizer = paddle.optimizer.SGD(learning_rate=0.001)
sgd_optimizer.minimize(avg_loss)

exec_strategy = static.ExecutionStrategy()
exec_strategy.num_threads = 4

train_exe = static.ParallelExecutor(use_cuda=False,
                                    loss_name=avg_loss.name,
                                    exec_strategy=exec_strategy)
property allow_op_delay

The type is BOOL, allow_op_delay represents whether to delay the communication operators to run, it may make the execution faster. Note that this option is invalid now, and it will be removed in next version. Default False.

property num_iteration_per_drop_scope

The type is INT, num_iteration_per_drop_scope indicates how many iterations to clean up the temp variables which is generated during execution. It may make the execution faster, because the temp variable’s shape maybe the same between two iterations. Default 100.

Note

1. If you fetch data when calling the ‘run’, the ParallelExecutor will clean up the temp variables at the end of the current iteration. 2. In some NLP model, it may cause the GPU memory is insufficient, in this case, you should reduce num_iteration_per_drop_scope.

Examples

import paddle
import paddle.static as static

paddle.enable_static()

exec_strategy = static.ExecutionStrategy()
exec_strategy.num_iteration_per_drop_scope = 10
property num_iteration_per_run

This config that how many iteration the executor will run when user call exe.run() in python。Default: 1.

Examples

import paddle
import paddle.static as static

paddle.enable_static()

exec_strategy = static.ExecutionStrategy()
exec_strategy.num_iteration_per_run = 10
property num_threads

The type is INT, num_threads represents the size of thread pool that used to run the operators of the current program in ParallelExecutor. If \(num\_threads=1\), all the operators will execute one by one, but the order maybe difference between iterations. If it is not set, it will be set in ParallelExecutor according to the device type and device count, for GPU, \(num\_threads=device\_count*4\), for CPU, \(num\_threads=CPU\_NUM*4\), the explanation of:math:CPU_NUM is in ParallelExecutor. if it is not set, ParallelExecutor will get the cpu count by calling multiprocessing.cpu_count(). Default 0.

Examples

import paddle
import paddle.static as static

paddle.enable_static()

exec_strategy = static.ExecutionStrategy()
exec_strategy.num_threads = 4
property use_thread_barrier

This config that the this is distributed training with parameter server