Inference Engine and Configuration¶
PaddleOCR 3.5 introduces a unified inference-engine configuration mechanism: use engine to select the underlying inference engine, and use engine_config to pass engine-specific settings. This mechanism applies to both individual models and pipelines.
If engine is not explicitly specified, the default behavior remains the same as in earlier versions: except for a few scenarios such as high-performance inference and generative AI client request features, PaddleOCR uses the PaddlePaddle framework for inference in most cases. If engine is explicitly specified, initialization follows the selected engine first.
1. What Is an Inference Engine¶
In PaddleOCR, an inference engine refers to the underlying runtime used to execute a model. It determines which runtime loads and runs the model. You can think of it as "the engine actually used during model inference." When using an inference engine, users usually only need to care about two things:
- which type of inference engine to use;
- how to configure the inference engine.
2. Inference Engines Currently Supported by PaddleOCR¶
| Engine category | engine values |
Description |
|---|---|---|
| PaddlePaddle framework | paddle, paddle_static, paddle_dynamic |
Runs on the PaddlePaddle framework. |
| Transformers | transformers |
Runs on Hugging Face Transformers. |
paddle: The unified entry point for the PaddlePaddle framework. It selectspaddle_staticorpaddle_dynamicaccording to the model type and files in the model directory. If both are available,paddle_staticis preferred.paddle_static: PaddlePaddle static-graph inference, suitable for scenarios that require better inference performance or more fine-grained performance tuning.paddle_dynamic: PaddlePaddle dynamic-graph inference, which is more flexible and easier to debug compared to static graph.transformers: Hugging Face Transformers inference, making it convenient to integrate with the Hugging Face ecosystem.
3. Installation by Inference Engine¶
3.1 PaddlePaddle framework¶
When using the PaddlePaddle framework for inference, you need to install PaddlePaddle first. For installation instructions, see PaddlePaddle Framework Installation.
3.2 Transformers¶
When using Transformers as the inference engine, you need to install Hugging Face Transformers. Example command:
In many cases, you also need to install the underlying inference framework. For details, see the Transformers official documentation.
4. Configuration and Supported Values of engine and engine_config¶
4.1 engine¶
engine is used to specify the inference engine. Supported values are:
| Value | Meaning | Description |
|---|---|---|
None |
No explicit engine specified | Automatically determines the inference engine. Keep the behavior of PaddleOCR 3.4; in most cases, the PaddlePaddle framework will be used for inference. |
paddle |
Unified PaddlePaddle framework entry | Automatically selects paddle_static or paddle_dynamic. |
paddle_static |
Static-graph inference | Uses Paddle static-graph inference. |
paddle_dynamic |
Dynamic-graph inference | Uses Paddle dynamic-graph inference. |
transformers |
Transformers inference | Uses Hugging Face Transformers inference. |
4.2 engine_config¶
engine_config is used to configure the inference engine and is recommended to be used together with engine. Common engine_config fields for each engine are listed below:
paddle_static¶
Common fields include:
run_mode: execution mode, such aspaddle,trt_fp32,trt_fp16, andmkldnn;device_type/device_id: device type and device index;cpu_threads: number of CPU inference threads;delete_pass: list of graph optimization passes to disable manually;enable_new_ir: whether to enable the new IR;enable_cinn: whether to enable CINN;trt_cfg_setting: low-level TensorRT configuration;trt_use_dynamic_shapes: whether to enable TensorRT dynamic shapes;trt_collect_shape_range_info: whether to collect shape range information;trt_discard_cached_shape_range_info: whether to discard existing shape range information and recollect it;trt_dynamic_shapes: dynamic shape configuration;trt_dynamic_shape_input_data: data used to fill input tensors when collecting dynamic shapes;trt_shape_range_info_path: path to the shape range information file;trt_allow_rebuild_at_runtime: whether rebuilding the TensorRT engine is allowed at runtime;mkldnn_cache_capacity: oneDNN (MKLDNN) cache capacity.
paddle_dynamic¶
Common fields include:
device_type/device_id: device type and device index used during dynamic-graph execution.
transformers¶
Common fields include:
dtype: data type used for model weights / inference, such asfloat16;device_type/device_id: inference device type and device index;trust_remote_code: whether to trust and execute custom code in model repositories;attn_implementation: attention implementation method, such asflash_attention_2;generation_config: generation parameters, such asmax_new_tokensandtemperature;model_kwargs: extra arguments passed to the model loading API;processor_kwargs: extra arguments passed to the processor / image processor loading API;tokenizer_kwargs: a compatibility-preserved field that is merged withprocessor_kwargs.
4.2.1 Flat vs. bucketed engine_config¶
At the same level engine_config may be:
- Flat: a dict whose keys are only those required by the resolved engine (for example, when using static graph only, top-level keys such as
run_modeandcpu_threads). - Bucketed: top-level keys are only registered engine names (e.g.
paddle_static,paddle_dynamic,transformers), each mapping to a nested dict. You must not mix bucket keys with flat keys at the same level (e.g.{"paddle_static": {...}, "run_mode": "paddle"}is invalid).
When an engine is resolved, only the corresponding config is used: flat configs are validated as a whole; bucketed configs take the entry for that engine.
4.3 Priority and Override Rules¶
- For pipelines,
engineandengine_configpassed through CLI arguments or Python API initialization arguments take precedence over fields with the same names in the pipeline configuration file. - In pipeline configuration files, top-level
engineandengine_configact as global settings;engineandengine_configin submodules or sub-pipelines can override upper-level settings. - For more complete rules about priority, overriding, and pipeline configuration behavior, refer to the PaddleX documentation: PaddleX Pipeline Python API Usage.
4.4 Compatibility Rules¶
- When
engineis explicitly set,enable_hpino longer takes effect. - When
engine_configis explicitly provided, compatibility arguments for the selected engine are ignored. For example, inpaddle/paddle_staticscenarios, compatibility arguments such asuse_tensorrt,precision,enable_mkldnn,mkldnn_cache_capacity,cpu_threads, andenable_cinnno longer take effect.
5. Usage Examples¶
5.1 Individual model (CLI): select the engine with --engine¶
5.2 Individual model (Python): explicitly specify transformers¶
from paddleocr import TextDetection
model = TextDetection(
model_name="PP-OCRv5_server_det",
engine="transformers",
)
result = model.predict("general_ocr_001.png")
5.3 Individual model (Python): specify paddle_static and engine_config¶
from paddleocr import TextDetection
model = TextDetection(
model_name="PP-OCRv5_server_det",
engine="paddle_static",
engine_config={
"device_type": "cpu",
"cpu_threads": 4,
"run_mode": "mkldnn",
},
)
result = model.predict("general_ocr_001.png")
5.4 Pipeline (CLI): select the engine with --engine¶
5.5 Pipeline (Python API): configure the inference engine for a specific module¶
If you want to specify engine and engine_config for a specific module inside a pipeline, you can first export the configuration file, modify the corresponding module configuration, and then load it. For how to export, edit, and load the configuration file, see Using PaddleX Pipeline Configuration Files. Example:
First, export the pipeline configuration file:
from paddleocr import PaddleOCR
pipeline = PaddleOCR()
pipeline.export_paddlex_config_to_yaml("ocr_config.yaml")
Then, set engine and engine_config specifically for the TextDetection module in ocr_config.yaml:
pipeline_name: OCR
SubModules:
TextDetection:
engine: paddle_static
engine_config:
device_type: cpu
cpu_threads: 4
run_mode: mkldnn
Use the updated configuration file for inference:
from paddleocr import PaddleOCR
pipeline = PaddleOCR(paddlex_config="ocr_config.yaml")
result = pipeline.predict("general_ocr_001.png")