Chart Parsing Module Tutorial¶
1. Overview¶
Multimodal chart parsing is a cutting-edge OCR technology that focuses on automatically converting various types of visual charts (such as bar charts, line charts, pie charts, etc.) into structured data tables with formatted output. Traditional methods rely on complex pipeline designs with chart keypoint detection models, which involve many prior assumptions and tend to lack robustness. The models in this module leverage the latest VLM (Vision-Language Model) techniques and are data-driven, learning robust features from vast real-world datasets. Application scenarios include financial analysis, academic research, business reporting, and more—for instance, quickly extracting growth trend data from financial reports, experimental comparison figures from research papers, or user distribution statistics from market surveys—empowering users to transition from “viewing charts” to “using data”.
2. Supported Model List¶
Model | Download Link | Model Size (B) | Storage Size (GB) | Score | Description |
---|---|---|---|---|---|
PP-Chart2Table | Inference Model | 0.58 | 1.4 | 80.60 | PP-Chart2Table is a multimodal chart parsing model developed by the PaddlePaddle team. It demonstrates exceptional performance on both Chinese and English chart parsing tasks. The team designed a specialized “Shuffled Chart Data Retrieval” training task and adopted a carefully designed token masking strategy, significantly improving performance on chart-to-table conversion. Additionally, the team enhanced the model with a high-quality data synthesis process using seed data, RAG, and LLM persona-driven generation to diversify training data. To handle large amounts of out-of-distribution (OOD) unlabeled data, a two-stage large model distillation process was used to ensure excellent adaptability and generalization to diverse real-world data. In internal Chinese-English use case evaluations, PP-Chart2Table achieved state-of-the-art performance among models of similar size and reached accuracy comparable to 7B-parameter VLMs in key scenarios. |
Note: The scores above are based on internal evaluation on a test set of 1801 samples, covering various chart types (bar, line, pie, etc.) across scenarios such as financial reports, regulations, and contracts. There is currently no plan for public release.
❗ Note: The PP-Chart2Table model was upgraded on June 27, 2025. To use the previous version, please download it here
3. Quick Start¶
❗ Before getting started, please install the PaddleOCR wheel package. Refer to the Installation Guide for details.
Run the following command to get started instantly:
paddleocr chart_parsing -i "{'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png'}"
````
**Note:** By default, PaddleOCR retrieves models from HuggingFace. If HuggingFace access is restricted in your environment, you can switch the model source to BOS by setting the environment variable: `PADDLE_PDX_MODEL_SOURCE="BOS"`. Support for more mainstream sources is planned.
You can also integrate the inference of the vision-language model into your own project. Please download the [example image](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png) locally before running the following code:
```python
from paddleocr import ChartParsing
model = ChartParsing(model_name="PP-Chart2Table")
results = model.predict(
input={"image": "chart_parsing_02.png"},
batch_size=1
)
for res in results:
res.print()
res.save_to_json(f"./output/res.json")
The output result will be:
{'res': {'image': 'chart_parsing_02.png', 'result': 'Year | Avg Revenue per 5-star Hotel (Million CNY) | Avg Profit per 5-star Hotel (Million CNY)\n2018 | 104.22 | 9.87\n2019 | 99.11 | 7.47\n2020 | 57.87 | -3.87\n2021 | 68.99 | -2.9\n2022 | 56.29 | -9.48\n2023 | 87.99 | 5.96'}}
Explanation of output parameters:
image
: The path to the input imageresult
: The model's prediction output
The visualized result is:
Year | Avg Revenue per 5-star Hotel (Million CNY) | Avg Profit per 5-star Hotel (Million CNY)
2018 | 104.22 | 9.87
2019 | 99.11 | 7.47
2020 | 57.87 | -3.87
2021 | 68.99 | -2.9
2022 | 56.29 | -9.48
2023 | 87.99 | 5.96
Detailed explanation of related methods and parameters:
- Instantiate a vision-language model with
ChartParsing
. Parameters:
Parameter | Description | Type | Default |
---|---|---|---|
model_name |
Model name. If set to None , defaults to PP-Chart2Table . |
str | None |
None |
model_dir |
Model storage path. | str | None |
None |
device |
Inference device. Examples: "cpu" , "gpu" , "npu" , "gpu:0" Defaults to GPU 0 if available; otherwise falls back to CPU. |
str | None |
None |
- Use the model's
predict()
method for inference. This returns a list of results. The module also offers apredict_iter()
method, which behaves identically in terms of inputs and outputs but returns a generator—ideal for large datasets or memory-sensitive scenarios. Choose based on your needs.
predict()
method parameters:
Parameter | Description | Type | Default |
---|---|---|---|
input |
Input data (required). Input formats vary by model. For PP-Chart2Table: {'image': image_path}
|
dict |
N/A |
batch_size |
Batch size. Any positive integer. | int |
1 |
- Prediction results are returned as
Result
objects for each sample, with support for printing and saving to JSON:
Method | Description | Parameter | Type | Explanation | Default |
---|---|---|---|---|---|
print() |
Print results to terminal | format_json |
bool |
Format output using JSON indentation | True |
indent |
int |
Indentation level for pretty-printed JSON. Only works when format_json=True |
4 | ||
ensure_ascii |
bool |
Whether to escape non-ASCII characters to Unicode. If False , keeps characters as-is. |
False |
||
save_to_json() |
Save results to JSON file | save_path |
str |
File path to save. If a directory, file will use input name as filename. | N/A |
indent |
int |
Same as in `print()` | 4 | ||
ensure_ascii |
bool |
Same as in `print()` | False |
- You can also access the result via properties:
Property | Description |
---|---|
json |
Returns the result in JSON format |
4. Custom Development¶
Currently, this module supports inference only and does not yet support fine-tuning. Fine-tuning capabilities are planned for future releases.