others¶
FLAGS_benchmark¶
(since 0.12.0)
Used to do benchmark. If set, it will make scope delete synchronized, add some memory usage log, and synchronize all cuda kernel after kernel launches.
Values accepted¶
Bool. The default value is False.
Example¶
FLAGS_benchmark=True will do some synchronizations to test benchmark.
FLAGS_inner_op_parallelism¶
(since 1.3.0)
Most operators are working in single thread mode, but for some operator, use multi thread is more suitable. For Example, optimization op that optimize sparse gradient will be much faster to use multi thread. This flag is used to set the thread number inside an operator.
Values accepted¶
Int32. The default value is 0 which means that operator will not run in multi thread mode.
Example¶
FLAGS_inner_op_parallelism=5 will set the thread number inside an operator to 5.
Note¶
currently only sparse adam op supports inner_op_parallelism.
FLAGS_max_body_size¶
(Since 1.0.0)
It controls the max message size in BRPC.
Values accepted¶
Int32. The default value is 2147483647.
Example¶
FLAGS_max_body_size=2147483647 will set the BRPC message size to 2147483647.
FLAGS_sync_nccl_allreduce¶
(since 1.3)
If the FLAGS_sync_nccl_allreduce is true, there will call cudaStreamSynchronize(nccl_stream) in allreduce_op_handle, this mode can get better performance in some scenarios.
Values accepted¶
Bool. The default value is True.
Example¶
FLAGS_sync_nccl_allreduce=True will call cudaStreamSynchronize(nccl_stream) in allreduce_op_handle.
FLAGS_tracer_profile_fname¶
(since 1.4.0)
FLAGS_tracer_profile_fname indicates the profiler filename for imperative tracer, which generated by gperftools. Only valid when compiled WITH_PROFILER=ON. Empty if disabled.
Values accepted¶
String. The default value is (“gperf”).
Example¶
FLAGS_tracer_profile_fname=”gperf_profile_file” will set the profiler filename for imperative tracer to “gperf_profile_file”.