Struct AnalysisConfig¶

Defined in File paddle_analysis_config.h

Struct Documentation¶

struct paddle::AnalysisConfig¶

configuration manager for AnalysisPredictor.

AnalysisConfig manages configurations of AnalysisPredictor. During inference procedure, there are many parameters(model/params path, place of inference, etc.) to be specified, and various optimizations(subgraph fusion, memory optimazation, TensorRT engine, etc.) to be done. Users can manage these settings by creating and modifying an AnalysisConfig, and loading it into AnalysisPredictor.

Since: 1.7.0

Public Types

enum Precision¶

Precision of inference in TensorRT.

Values:

enumerator kFloat32¶: fp32

enumerator kInt8¶: int8

enumerator kHalf¶: fp16

Public Functions

AnalysisConfig() = default¶

explicit AnalysisConfig(const AnalysisConfig &other)¶

Construct a new AnalysisConfig from another AnalysisConfig.

Parameters: other – [in] another AnalysisConfig

explicit AnalysisConfig(const std::string &model_dir)¶

Construct a new AnalysisConfig from a no-combined model.

Parameters: model_dir – [in] model directory of the no-combined model.

explicit AnalysisConfig(const std::string &prog_file, const std::string &params_file)¶

Construct a new AnalysisConfig from a combined model.

Parameters

prog_file – [in] model file path of the combined model.
params_file – [in] params file path of the combined model.

inline void SetModel(const std::string &model_dir)¶

Set the no-combined model dir path.

Parameters: model_dir – model dir path.

void SetModel(const std::string &prog_file_path, const std::string &params_file_path)¶

Set the combined model with two specific pathes for program and parameters.

Parameters

prog_file_path – model file path of the combined model.
params_file_path – params file path of the combined model.

inline void SetProgFile(const std::string &x)¶

Set the model file path of a combined model.

Parameters: x – model file path.

inline void SetParamsFile(const std::string &x)¶

Set the params file path of a combined model.

Parameters: x – params file path.

inline void SetOptimCacheDir(const std::string &opt_cache_dir)¶

Set the path of optimization cache directory.

Parameters: opt_cache_dir – the path of optimization cache directory.

inline const std::string &model_dir() const¶

Get the model directory path.

Returns: const std::string& The model directory path.

inline const std::string &prog_file() const¶

Get the program file path.

Returns: const std::string& The program file path.

inline const std::string &params_file() const¶

Get the combined parameters file.

Returns: const std::string& The combined parameters file.

void DisableFCPadding()¶: Turn off FC Padding.

inline bool use_fc_padding() const¶

A boolean state telling whether fc padding is used.

Returns: bool Whether fc padding is used.

void EnableUseGpu(uint64_t memory_pool_init_size_mb, int device_id = 0)¶

Turn on GPU.

Parameters

memory_pool_init_size_mb – initial size of the GPU memory pool in MB.
device_id – device_id the GPU card to use (default is 0).

void DisableGpu()¶: Turn off GPU.

inline bool use_gpu() const¶

A boolean state telling whether the GPU is turned on.

Returns: bool Whether the GPU is turned on.

inline int gpu_device_id() const¶

Get the GPU device id.

Returns: int The GPU device id.

inline int memory_pool_init_size_mb() const¶

Get the initial size in MB of the GPU memory pool.

Returns: int The initial size in MB of the GPU memory pool.

float fraction_of_gpu_memory_for_pool() const¶

Get the proportion of the initial memory pool size compared to the device.

Returns: float The proportion of the initial memory pool size.

void EnableCUDNN()¶: Turn on CUDNN.

inline bool cudnn_enabled() const¶

A boolean state telling whether to use CUDNN.

Returns: bool Whether to use CUDNN.

inline void SwitchIrOptim(int x = true)¶

Control whether to perform IR graph optimization. If turned off, the AnalysisConfig will act just like a NativeConfig.

Parameters: x – Whether the ir graph optimization is actived.

inline bool ir_optim() const¶

A boolean state telling whether the ir graph optimization is actived.

Returns: bool Whether to use ir graph optimization.

inline void SwitchUseFeedFetchOps(int x = true)¶

INTERNAL Determine whether to use the feed and fetch operators. Just for internal development, not stable yet. When ZeroCopyTensor is used, this should be turned off.

Parameters: x – Whether to use the feed and fetch operators.

inline bool use_feed_fetch_ops_enabled() const¶

A boolean state telling whether to use the feed and fetch operators.

Returns: bool Whether to use the feed and fetch operators.

inline void SwitchSpecifyInputNames(bool x = true)¶

Control whether to specify the inputs’ names. The ZeroCopyTensor type has a name member, assign it with the corresponding variable name. This is used only when the input ZeroCopyTensors passed to the AnalysisPredictor.ZeroCopyRun() cannot follow the order in the training phase.

Parameters: x – Whether to specify the inputs’ names.

inline bool specify_input_name() const¶

A boolean state tell whether the input ZeroCopyTensor names specified should be used to reorder the inputs in AnalysisPredictor.ZeroCopyRun().

Returns: bool Whether to specify the inputs’ names.

void EnableTensorRtEngine(int workspace_size = 1 << 20, int max_batch_size = 1, int min_subgraph_size = 3, Precision precision = Precision::kFloat32, bool use_static = false, bool use_calib_mode = true)¶

Turn on the TensorRT engine. The TensorRT engine will accelerate some subgraphes in the original Fluid computation graph. In some models such as resnet50, GoogleNet and so on, it gains significant performance acceleration.

Parameters

workspace_size – The memory size(in byte) used for TensorRT workspace.
max_batch_size – The maximum batch size of this prediction task, better set as small as possible for less performance loss.
min_subgrpah_size – The minimum TensorRT subgraph size needed, if a subgraph is smaller than this, it will not be transferred to TensorRT engine.
precision – The precision used in TensorRT.
use_static – Serialize optimization information to disk for reusing.
use_calib_mode – Use TRT int8 calibration(post training quantization).

inline bool tensorrt_engine_enabled() const¶

A boolean state telling whether the TensorRT engine is used.

Returns: bool Whether the TensorRT engine is used.

void SetTRTDynamicShapeInfo(std::map<std::string, std::vector<int>> min_input_shape, std::map<std::string, std::vector<int>> max_input_shape, std::map<std::string, std::vector<int>> optim_input_shape, bool disable_trt_plugin_fp16 = false)¶

Set min, max, opt shape for TensorRT Dynamic shape mode.

Parameters

min_input_shape – The min input shape of the subgraph input.
max_input_shape – The max input shape of the subgraph input.
opt_input_shape – The opt input shape of the subgraph input.
disable_trt_plugin_fp16 – Setting this parameter to true means that TRT plugin will not run fp16.

void EnableLiteEngine(AnalysisConfig::Precision precision_mode = Precision::kFloat32, const std::vector<std::string> &passes_filter = {}, const std::vector<std::string> &ops_filter = {})¶

Turn on the usage of Lite sub-graph engine.

Parameters

precision_mode – Precion used in Lite sub-graph engine.
passes_filter – Set the passes used in Lite sub-graph engine.
ops_filter – Operators not supported by Lite.

inline bool lite_engine_enabled() const¶

A boolean state indicating whether the Lite sub-graph engine is used.

Returns: bool whether the Lite sub-graph engine is used.

void SwitchIrDebug(int x = true)¶

Control whether to debug IR graph analysis phase. This will generate DOT files for visualizing the computation graph after each analysis pass applied.

Parameters: x – whether to debug IR graph analysis phase.

void EnableMKLDNN()¶: Turn on MKLDNN.

void SetMkldnnCacheCapacity(int capacity)¶

Set the cache capacity of different input shapes for MKLDNN. Default value 0 means not caching any shape.

Parameters: capacity – The cache capacity.

inline bool mkldnn_enabled() const¶

A boolean state telling whether to use the MKLDNN.

Returns: bool Whether to use the MKLDNN.

void SetCpuMathLibraryNumThreads(int cpu_math_library_num_threads)¶

Set the number of cpu math library threads.

Parameters: cpu_math_library_num_threads – The number of cpu math library threads.

inline int cpu_math_library_num_threads() const¶

An int state telling how many threads are used in the CPU math library.

Returns: int The number of threads used in the CPU math library.

NativeConfig ToNativeConfig() const¶

Transform the AnalysisConfig to NativeConfig.

Returns: NativeConfig The NativeConfig transformed.

inline void SetMKLDNNOp(std::unordered_set<std::string> op_list)¶

Specify the operator type list to use MKLDNN acceleration.

Parameters: op_list – The operator type list.

void EnableMkldnnQuantizer()¶: Turn on MKLDNN quantization.

inline bool mkldnn_quantizer_enabled() const¶

A boolean state telling whether the MKLDNN quantization is enabled.

Returns: bool Whether the MKLDNN quantization is enabled.

MkldnnQuantizerConfig *mkldnn_quantizer_config() const¶

Get MKLDNN quantizer config.

Returns: MkldnnQuantizerConfig* MKLDNN quantizer config.

void SetModelBuffer(const char *prog_buffer, size_t prog_buffer_size, const char *params_buffer, size_t params_buffer_size)¶

Specify the memory buffer of program and parameter. Used when model and params are loaded directly from memory.

Parameters

prog_buffer – The memory buffer of program.
prog_buffer_size – The size of the model data.
params_buffer – The memory buffer of the combined parameters file.
params_buffer_size – The size of the combined parameters data.

inline bool model_from_memory() const¶

A boolean state telling whether the model is set from the CPU memory.

Returns: bool Whether model and params are loaded directly from memory.

void EnableMemoryOptim()¶: Turn on memory optimize NOTE still in development.

bool enable_memory_optim() const¶

A boolean state telling whether the memory optimization is activated.

Returns: bool Whether the memory optimization is activated.

void EnableProfile()¶: Turn on profiling report. If not turned on, no profiling report will be generated.

inline bool profile_enabled() const¶

A boolean state telling whether the profiler is activated.

Returns: bool Whether the profiler is activated.

void DisableGlogInfo()¶: Mute all logs in Paddle inference.

inline bool glog_info_disabled() const¶

A boolean state telling whether logs in Paddle inference are muted.

Returns: bool Whether logs in Paddle inference are muted.

inline void SetInValid() const¶: Set the AnalysisConfig to be invalid. This is to ensure that an AnalysisConfig can only be used in one AnalysisPredictor.

inline bool is_valid() const¶

A boolean state telling whether the AnalysisConfig is valid.

Returns: bool Whether the AnalysisConfig is valid.

PassStrategy *pass_builder() const¶: Get a pass builder for customize the passes in IR analysis phase. NOTE: Just for developer, not an official API, easy to be broken.

void PartiallyRelease()¶

Protected Functions

void Update()¶

std::string SerializeInfoCache()¶

Protected Attributes

std::string model_dir_¶

mutable std::string prog_file_¶

mutable std::string params_file_¶

bool use_gpu_ = {false}¶

int device_id_ = {0}¶

uint64_t memory_pool_init_size_mb_ = {100}¶

bool use_cudnn_ = {false}¶

bool use_fc_padding_ = {true}¶

bool use_tensorrt_ = {false}¶

int tensorrt_workspace_size_ = {1 << 30}¶

int tensorrt_max_batchsize_ = {1}¶

int tensorrt_min_subgraph_size_ = {3}¶

Precision tensorrt_precision_mode_ = {Precision::kFloat32}¶

bool trt_use_static_engine_ = {false}¶

bool trt_use_calib_mode_ = {true}¶

std::map<std::string, std::vector<int>> min_input_shape_ = {}¶

std::map<std::string, std::vector<int>> max_input_shape_ = {}¶

std::map<std::string, std::vector<int>> optim_input_shape_ = {}¶

bool disable_trt_plugin_fp16_ = {false}¶

bool enable_memory_optim_ = {false}¶

bool use_mkldnn_ = {false}¶

std::unordered_set<std::string> mkldnn_enabled_op_types_¶

bool model_from_memory_ = {false}¶

bool enable_ir_optim_ = {true}¶

bool use_feed_fetch_ops_ = {true}¶

bool ir_debug_ = {false}¶

bool specify_input_name_ = {false}¶

int cpu_math_library_num_threads_ = {1}¶

bool with_profile_ = {false}¶

bool with_glog_info_ = {true}¶

std::string serialized_info_cache_¶

mutable std::unique_ptr<PassStrategy> pass_builder_¶

bool use_lite_ = {false}¶

std::vector<std::string> lite_passes_filter_¶

std::vector<std::string> lite_ops_filter_¶

Precision lite_precision_mode_¶

int mkldnn_cache_capacity_ = {0}¶

bool use_mkldnn_quantizer_ = {false}¶

std::shared_ptr<MkldnnQuantizerConfig> mkldnn_quantizer_config_¶

mutable bool is_valid_ = {true}¶

std::string opt_cache_dir_¶

Friends

friend class ::paddle::AnalysisPredictor