inference¶
- paddle.incubate. inference ( function: None = None, cache_static_model: bool = False, save_model_dir: str | None = None, memory_pool_init_size_mb: int = 1000, precision_mode: str = 'float32', switch_ir_optim: bool = True, switch_ir_debug: bool = False, enable_cinn: bool = False, with_trt: bool = False, trt_precision_mode: str = 'float32', trt_use_static: bool = False, collect_shape: bool = False, enable_new_ir: bool = False, exp_enable_use_cutlass: bool = False, delete_pass_lists: list[str] | None = None, skip_prune_program: bool = False ) paddle.incubate.jit.inference_decorator._InferenceDecorator [source]
- paddle.incubate. inference ( function: paddle.incubate.jit.inference_decorator._LayerT, cache_static_model: bool = False, save_model_dir: str | None = None, memory_pool_init_size_mb: int = 1000, precision_mode: str = 'float32', switch_ir_optim: bool = True, switch_ir_debug: bool = False, enable_cinn: bool = False, with_trt: bool = False, trt_precision_mode: str = 'float32', trt_use_static: bool = False, collect_shape: bool = False, enable_new_ir: bool = False, exp_enable_use_cutlass: bool = False, delete_pass_lists: list[str] | None = None, skip_prune_program: bool = False ) paddle.incubate.jit.inference_decorator._LayerT
- paddle.incubate. inference ( function: Callable[[paddle.incubate.jit.inference_decorator._InputT], paddle.incubate.jit.inference_decorator._RetT], cache_static_model: bool = False, save_model_dir: str | None = None, memory_pool_init_size_mb: int = 1000, precision_mode: str = 'float32', switch_ir_optim: bool = True, switch_ir_debug: bool = False, enable_cinn: bool = False, with_trt: bool = False, trt_precision_mode: str = 'float32', trt_use_static: bool = False, collect_shape: bool = False, enable_new_ir: bool = False, exp_enable_use_cutlass: bool = False, delete_pass_lists: list[str] | None = None, skip_prune_program: bool = False ) Callable[[paddle.incubate.jit.inference_decorator._InputT], paddle.incubate.jit.inference_decorator._RetT]
-
Converts dynamic graph APIs into static graph saved in disk. Then will use Paddle Inference to infer based on the static model in the disk. This function return a callable function, user can use it to inference just like dynamic function.
- Parameters
-
function (callable) – Callable dynamic graph function. It must be a member function of paddle.nn.Layer. If it used as a decorator, the decorated function will be parsed as this parameter.
cache_static_model – Whether to use the cached static model in thd disk . Default is False. when cache_static_model is True, the static model will be saved in disk, and the next time when you call this function
save_model_dir – The directory to save the static model. Default is none which means ~/.cache/paddle/inference_models/.
memory_pool_init_size_mb (int, optional) – The memory pool init size in MB. Default is 1000.
precision_mode (str, optional) – The precision mode. Default is “float32”.
switch_ir_optim (bool, optional) – Whether to enable IR optim. Default is True.
switch_ir_debug (bool, optional) – Whether to enable IR debug. Default is False.
enable_cinn (bool, optional) – Whether to enable CINN. Default is False.
with_trt (bool, optional) – Whether to enable TensorRT. Default is False.
trt_precision_mode (str, optional) – The precision mode of TensorRT. Default is “float32”.
trt_use_static (bool, optional) – Whether to use static shape in TensorRT. Default is False.
collect_shape (bool, optional) – Whether to collect shape. Default is False.
enable_new_ir (bool, optional) – Whether to enable new IR. Default is True.
exp_enable_use_cutlass (bool, optional) – Whether to enable use cutlass. Default is False.
delete_pass_lists (list[str], optional) – The list of pass names to delete. Default is None.
skip_prune_program (bool, optional) – Whether to skip pruning program when converting dynamic graph APIs into static graph. Default is False.
- Returns
-
the decorated function which can be used for inference.
- Return type
-
function (callable)
Examples
>>> >>> import paddle >>> class ExampleLayer(paddle.nn.Layer): ... def __init__(self, hidd): ... super().__init__() ... self.fn = paddle.nn.Linear(hidd, hidd, bias_attr=False) ... def forward(self, x): ... for i in range(10): ... x = paddle.nn.functional.softmax(x,-1) ... x = x.cast("float32") ... x = self.func(x) ... return x ... def func(self, x): ... x = x + x ... return self.fn(x) >>> batch = 4096 >>> hidd = 1024 >>> dtype = "bfloat16" >>> x = paddle.rand([batch, hidd], dtype=dtype) # type: ignore[arg-type] >>> mylayer = ExampleLayer(hidd) >>> dynamic_result = mylayer(x) >>> mylayer = paddle.incubate.jit.inference(mylayer) >>> decorator_result = mylayer(x)