PaddleOCR-VL Huawei Ascend NPU Environment Configuration Tutorial¶
This tutorial is a guide for configuring the PaddleOCR-VL Huawei Ascend NPU environment. The purpose is to complete the relevant environment setup. After the environment configuration is complete, please refer to the PaddleOCR-VL Usage Tutorial to use PaddleOCR-VL.
PaddleOCR-VL has been verified for accuracy and speed on the Huawei Ascend 910B. However, due to hardware diversity, compatibility with other Huawei Ascend NPUs has not yet been confirmed. We welcome the community to test on different hardware setups and share your results.
1. Environment Preparation¶
This step mainly introduces how to set up the runtime environment for PaddleOCR-VL. There are two methods available; choose either one:
-
Method 1: Use the official Docker image.
-
Method 2: Manually install PaddlePaddle and PaddleOCR.
We strongly recommend using the Docker image to minimize potential environment-related issues.
1.1 Method 1: Using Docker Image¶
We recommend using the official Docker image (requires Docker version >= 19.03):
docker run -it \
--user root \
--privileged \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/dcmi:/usr/local/dcmi \
--shm-size 64g \
--network host \
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-huawei-npu \
/bin/bash
# Call PaddleOCR CLI or Python API in the container
If you wish to start the service in an environment without internet access, replace ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-huawei-npu (image size approximately 28 GB) in the above command with the offline version image ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-huawei-npu-offline (image size approximately 30 GB).
Tip
Images with the latest-xxx tag correspond to the latest version of PaddleOCR. If you want to use a specific version of the PaddleOCR image, you can replace latest in the tag with the desired version number: paddleocr<major>.<minor>.
For example:
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:paddleocr3.4-huawei-npu-offline
1.2 Method 2: Manually Install PaddlePaddle and PaddleOCR¶
If you cannot use Docker, you can also manually install PaddlePaddle and PaddleOCR. Python version 3.8–3.12 is required.
We strongly recommend installing PaddleOCR-VL in a virtual environment to avoid dependency conflicts. For example, use the Python venv standard library to create a virtual environment:
# Create a virtual environment
python -m venv .venv_paddleocr
# Activate the environment
source .venv_paddleocr/bin/activate
Execute the following commands to complete the installation:
python -m pip install paddlepaddle==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
python -m pip install paddle-custom-npu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/npu/
python -m pip install -U "paddleocr[doc-parser]"
Please note to install PaddlePaddle version 3.2.1 or above.
2. Quick Start¶
The NPU currently does not support inference using the PaddlePaddle inference method. Please refer to the next section on using the vLLM inference acceleration framework for inference.
3. Improving VLM Inference Performance Using Inference Acceleration Framework¶
The inference performance under default configurations is not fully optimized and may not meet actual production requirements. This step mainly introduces how to use the vLLM inference acceleration framework to improve the inference performance of PaddleOCR-VL.
3.1 Starting the VLM Inference Service¶
PaddleOCR provides a Docker image for quickly starting the vLLM inference service. Use the following command to start the service (requires Docker version >= 19.03):
docker run -it \
--user root \
--privileged \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/dcmi:/usr/local/dcmi \
--shm-size 64g \
--network host \
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-huawei-npu \
paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm
If you wish to start the service in an environment without internet access, replace ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-huawei-npu (image size approximately 18 GB) in the above command with the offline version image ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-huawei-npu-offline (image size approximately 20 GB).
When launching the vLLM inference service, we provide a set of default parameter settings. If you need to adjust parameters such as GPU memory usage, you can configure additional parameters yourself. Please refer to 3.3.1 Server-side Parameter Adjustment to create a configuration file, then mount the file into the container and specify the configuration file using backend_config in the command to start the service, for example:
docker run -it \
--user root \
--privileged \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/dcmi:/usr/local/dcmi \
-v vllm_config.yml:/tmp/vllm_config.yml \
--shm-size 64g \
--network host \
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-huawei-npu \
paddleocr genai_server --model_name PaddleOCR-VL-0.9B --host 0.0.0.0 --port 8118 --backend vllm --backend_config /tmp/vllm_config.yml
Tip
Images with the latest-xxx tag correspond to the latest version of PaddleOCR. If you want to use a specific version of the PaddleOCR image, you can replace latest in the tag with the desired version number: paddleocr<major>.<minor>.
For example:
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:paddleocr3.4-huawei-npu-offline
3.2 Client Usage Method¶
Please refer to the corresponding section in the PaddleOCR-VL Usage Tutorial.
3.3 Performance Tuning¶
Please refer to the corresponding section in the PaddleOCR-VL Usage Tutorial.
4. Service Deployment¶
Please note that the PaddleOCR-VL service introduced in this section is different from the VLM inference service in the previous section: the latter is only responsible for one part of the complete process (i.e., VLM inference) and is called as an underlying service by the former.
This step mainly introduces how to use Docker Compose to deploy PaddleOCR-VL as a service and call it. The specific process is as follows:
-
Download the Compose file and the environment variable configuration file separately from here and here to your local machine.
-
Execute the following command in the directory where the
compose.yamland.envfiles are located to start the server, which listens on port 8080 by default:After startup, you will see output similar to the following:
This method accelerates VLM inference using the vLLM framework and is more suitable for production environment deployment.
Additionally, after starting the server in this manner, no internet connection is required except for image pulling. For deployment in an offline environment, you can first pull the images involved in the Compose file on a connected machine, export them, and transfer them to the offline machine for import to start the service in an offline environment.
Docker Compose starts two containers sequentially by reading configurations from the .env and compose.yaml files, running the underlying VLM inference service and the PaddleOCR-VL service (pipeline service) respectively.
The meanings of each environment variable contained in the .env file are as follows:
- `API_IMAGE_TAG_SUFFIX`: The tag suffix of the image used to launch the pipeline service.
- `VLM_BACKEND`: The VLM inference backend.
- `VLM_IMAGE_TAG_SUFFIX`: The tag suffix of the image used to launch the VLM inference service.
You can modify compose.yaml to meet custom requirements, for example:
1. Change the port of the PaddleOCR-VL service
Editpaddleocr-vl-api.ports in the compose.yaml file to change the port. For example, if you need to change the service port to 8111, make the following modifications:
2. Specify the NPU used by the PaddleOCR-VL service
Editenvironment in the compose.yaml file to change the NPU used. For example, if you need to use card 1 for deployment, make the following modifications:
3. Adjust VLM server-side configuration
If you want to adjust the VLM server configuration, refer to 3.3.1 Server Parameter Adjustment to generate a configuration file. After generating the configuration file, add the followingpaddleocr-vlm-server.volumes and paddleocr-vlm-server.command fields to your compose.yaml. Replace /path/to/your_config.yaml with your actual configuration file path.
4. Adjust pipeline-related configurations (such as model path, batch size, deployment device, etc.)
Refer to the 4.4 Pipeline Configuration Adjustment Instructions section.4.3 Client Invocation Method¶
Please refer to the corresponding section in the PaddleOCR-VL Usage Tutorial.
4.4 Pipeline Configuration Adjustment Instructions¶
Please refer to the corresponding section in the PaddleOCR-VL Usage Tutorial.
5. Model Fine-Tuning¶
Please refer to the corresponding section in the PaddleOCR-VL Usage Tutorial.