inference¶

paddle.incubate. inference ( function: None = None, cache_static_model: bool = False, save_model_dir: str | None = None, memory_pool_init_size_mb: int = 1000, precision_mode: str = 'float32', switch_ir_optim: bool = True, switch_ir_debug: bool = False, enable_cinn: bool = False, with_trt: bool = False, trt_precision_mode: str = 'float32', trt_use_static: bool = False, collect_shape: bool = False, enable_new_ir: bool = False, exp_enable_use_cutlass: bool = False, delete_pass_lists: list[str] | None = None, skip_prune_program: bool = False ) → paddle.incubate.jit.inference_decorator._InferenceDecorator [source]

paddle.incubate. inference ( function: paddle.incubate.jit.inference_decorator._LayerT, cache_static_model: bool = False, save_model_dir: str | None = None, memory_pool_init_size_mb: int = 1000, precision_mode: str = 'float32', switch_ir_optim: bool = True, switch_ir_debug: bool = False, enable_cinn: bool = False, with_trt: bool = False, trt_precision_mode: str = 'float32', trt_use_static: bool = False, collect_shape: bool = False, enable_new_ir: bool = False, exp_enable_use_cutlass: bool = False, delete_pass_lists: list[str] | None = None, skip_prune_program: bool = False ) → paddle.incubate.jit.inference_decorator._LayerT

paddle.incubate. inference ( function: Callable[[paddle.incubate.jit.inference_decorator._InputT], paddle.incubate.jit.inference_decorator._RetT], cache_static_model: bool = False, save_model_dir: str | None = None, memory_pool_init_size_mb: int = 1000, precision_mode: str = 'float32', switch_ir_optim: bool = True, switch_ir_debug: bool = False, enable_cinn: bool = False, with_trt: bool = False, trt_precision_mode: str = 'float32', trt_use_static: bool = False, collect_shape: bool = False, enable_new_ir: bool = False, exp_enable_use_cutlass: bool = False, delete_pass_lists: list[str] | None = None, skip_prune_program: bool = False ) → Callable[[paddle.incubate.jit.inference_decorator._InputT], paddle.incubate.jit.inference_decorator._RetT]

Converts dynamic graph APIs into static graph saved in disk. Then will use Paddle Inference to infer based on the static model in the disk. This function return a callable function, user can use it to inference just like dynamic function.

Parameters

function (callable) – Callable dynamic graph function. It must be a member function of paddle.nn.Layer. If it used as a decorator, the decorated function will be parsed as this parameter.
cache_static_model – Whether to use the cached static model in thd disk . Default is False. when cache_static_model is True, the static model will be saved in disk, and the next time when you call this function
save_model_dir – The directory to save the static model. Default is none which means ~/.cache/paddle/inference_models/.
memory_pool_init_size_mb (int, optional) – The memory pool init size in MB. Default is 1000.
precision_mode (str, optional) – The precision mode. Default is “float32”.
switch_ir_optim (bool, optional) – Whether to enable IR optim. Default is True.
switch_ir_debug (bool, optional) – Whether to enable IR debug. Default is False.
enable_cinn (bool, optional) – Whether to enable CINN. Default is False.
with_trt (bool, optional) – Whether to enable TensorRT. Default is False.
trt_precision_mode (str, optional) – The precision mode of TensorRT. Default is “float32”.
trt_use_static (bool, optional) – Whether to use static shape in TensorRT. Default is False.
collect_shape (bool, optional) – Whether to collect shape. Default is False.
enable_new_ir (bool, optional) – Whether to enable new IR. Default is True.
exp_enable_use_cutlass (bool, optional) – Whether to enable use cutlass. Default is False.
delete_pass_lists (list[str], optional) – The list of pass names to delete. Default is None.
skip_prune_program (bool, optional) – Whether to skip pruning program when converting dynamic graph APIs into static graph. Default is False.

Returns

the decorated function which can be used for inference.

Return type

function (callable)

Examples

>>> 
>>> import paddle
>>> class ExampleLayer(paddle.nn.Layer):
...     def __init__(self, hidd):
...         super().__init__()
...         self.fn = paddle.nn.Linear(hidd, hidd, bias_attr=False)
...     def forward(self, x):
...         for i in range(10):
...             x = paddle.nn.functional.softmax(x,-1)
...         x = x.cast("float32")
...         x = self.func(x)
...         return x
...     def func(self, x):
...         x = x + x
...         return self.fn(x)

>>> batch = 4096
>>> hidd = 1024
>>> dtype = "bfloat16"
>>> x = paddle.rand([batch, hidd], dtype=dtype) # type: ignore[arg-type]
>>> mylayer = ExampleLayer(hidd)
>>> dynamic_result = mylayer(x)
>>> mylayer = paddle.incubate.jit.inference(mylayer)
>>> decorator_result = mylayer(x)