PP-OCRv3 is a further upgrade of PP-OCRv2. This section introduces the training steps of the PP-OCRv3 detection model. For an introduction to the PP-OCRv3 strategy, refer to document.
The PP-OCRv3 detection model is an upgrade of the CML (Collaborative Mutual Learning) collaborative mutual learning text detection distillation strategy in PP-OCRv2. PP-OCRv3 further optimizes the detection teacher model and student model. Among them, when optimizing the teacher model, the PAN structure LK-PAN with a large receptive field and the DML (Deep Mutual Learning) distillation strategy are proposed; when optimizing the student model, the FPN structure RSE-FPN with a residual attention mechanism is proposed.
PP-OCRv3 detection training includes two steps:
Step 1: Use DML distillation method to train detection teacher model
Step 2: Use the teacher model obtained in step 1 to train a lightweight student model using CML method
The configuration file for teacher model training is PP-OCRv3_det_dml.yml. The Backbone, Neck, and Head of the teacher model structure are Resnet50, LKPAN, and DBHead respectively, and are trained using the DML distillation method. For a detailed introduction to the configuration file, refer to Document.
# Single card trainingpython3tools/train.py-cconfigs/det/PP-OCRv3/PP-OCRv3_det_dml.yml\-oArchitecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained\Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained\Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, please use the following command:python3-mpaddle.distributed.launch--gpus'0,1,2,3'tools/train.py-cconfigs/det/PP-OCRv3/PP-OCRv3_det_dml.yml\-oArchitecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained\Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained\Global.save_model_dir=./output/
The model saved during training is in the output directory, which contains the following files:
best_accuracy.states
best_accuracy.pdparams# The model parameters with the best accuracy are saved by defaultbest_accuracy.pdopt# The optimizer-related parameters with the best accuracy are saved by defaultlatest.states
latest.pdparams# The latest model parameters saved by defaultlatest.pdopt# The optimizer-related parameters of the latest model saved by default
Among them, best_accuracy is the model parameter with the highest accuracy saved, and the model can be directly used for evaluation.
The trained teacher model has a larger structure and higher accuracy, which is used to improve the accuracy of the student model.
Extract teacher model parameters
best_accuracy contains the parameters of two models, corresponding to Student and Student2 in the configuration file. The method to extract the parameters of Student is as follows:
importpaddle
# Load pre-trained modelall_params=paddle.load("output/best_accuracy.pdparams")# View the keys of weight parametersprint(all_params.keys())# Model weight extractions_params={key[len("Student."):]:all_params[key]forkeyinall_paramsif"Student."inkey}# View the keys of model weight parametersprint(s_params.keys())# Savepaddle.save(s_params,"./pretrain_models/dml_teacher.pdparams")
The extracted model parameters can be used for further fine-tuning or distillation training of the model.
The configuration file for training the student model is PP-OCRv3_det_cml.yml
The teacher model trained in the previous section is used as supervision, and the CML method is used to train a lightweight student model.
Download the ImageNet pre-trained model of the student model:
# Download the pre-trained model of MobileNetV3wget-P./pretrain_models/https://paddleocr.bj.bcebos.com/pretrained/MobileNetV3_large_x0_5_pretrained.pdparams
# Single card trainingpython3tools/train.py-cconfigs/det/PP-OCRv3/PP-OCRv3_det_cml.yml\-oArchitecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained\Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained\Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher\Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, please use the following command:python3-mpaddle.distributed.launch--gpus'0,1,2,3'tools/train.py-cconfigs/det/PP-OCRv3/PP-OCRv3_det_cml.yml\-oArchitecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained\Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained\Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher\Global.save_model_dir=./output/
The model saved during the training process is in the output directory.
The model evaluation command is as follows:
best_accuracy contains the parameters of three models, corresponding to Student, Student2, and Teacher in the configuration file. The method to extract Student parameters is as follows:
importpaddle
# Load pre-trained modelall_params=paddle.load("output/best_accuracy.pdparams")# View the keys of weight parametersprint(all_params.keys())# Model weight extractions_params={key[len("Student."):]:all_params[key]forkeyinall_paramsif"Student."inkey}# View the keys of model weight parametersprint(s_params.keys())# Savepaddle.save(s_params,"./pretrain_models/cml_student.pdparams")
The extracted Student parameters can be used for model deployment or further fine-tuning training.
3. Fine-tune training based on PP-OCRv3 detection¶
This section describes how to use the PP-OCRv3 detection model for fine-tune training in other scenarios.
Fine-tune training is applicable to three scenarios:
Fine-tune training based on the CML distillation method is applicable to scenarios where the teacher model has higher accuracy than the PP-OCRv3 detection model in the usage scenario and a lightweight detection model is desired.
Fine-tune training based on the PP-OCRv3 lightweight detection model does not require the training of the teacher model and is intended to improve the accuracy of the usage scenario based on the PP-OCRv3 detection model.
Fine-tune training based on the DML distillation method is applicable to scenarios where the DML method is used to further improve accuracy.
Finetune training based on CML distillation method
# Single card trainingpython3tools/train.py-cconfigs/det/PP-OCRv3/PP-OCRv3_det_cml.yml\-oGlobal.pretrained_model=./ch_PP-OCRv3_det_distill_train/best_accuracy\Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, please use the following command:python3-mpaddle.distributed.launch--gpus'0,1,2,3'tools/train.py-cconfigs/det/PP-OCRv3/PP-OCRv3_det_cml.yml\-oGlobal.pretrained_model=./ch_PP-OCRv3_det_distill_train/best_accuracy\Global.save_model_dir=./output/
Finetune training based on PP-OCRv3 lightweight detection model
Download PP-OCRv3 training model and extract model parameters of Student structure:
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar
The method to extract Student parameters is as follows:
importpaddle
# Load pre-trained modelall_params=paddle.load("output/best_accuracy.pdparams")# View the keys of weight parametersprint(all_params.keys())# Model weight extractions_params={key[len("Student."):]:all_params[key]forkeyinall_paramsif"Student."inkey}# View the keys of the model weight parametersprint(s_params.keys())# Savepaddle.save(s_params,"./student.pdparams")
# Single card trainingpython3tools/train.py-cconfigs/det/PP-OCRv3/PP-OCRv3_mobile_det.yml\-oGlobal.pretrained_model=./student\Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, please use the following command:python3-mpaddle.distributed.launch--gpus'0,1,2,3'tools/train.py-cconfigs/det/PP-OCRv3/PP-OCRv3_mobile_det.yml\-oGlobal.pretrained_model=./student\Global.save_model_dir=./output/
Finetune training based on DML distillation method
Take the Teacher model in ch_PP-OCRv3_det_distill_train as an example. First, extract the parameters of the Teacher structure. The method is as follows:
importpaddle
# Load pre-trained modelall_params=paddle.load("ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")# View the keys of weight parametersprint(all_params.keys())# Model weight extractions_params={key[len("Teacher."):]:all_params[key]forkeyinall_paramsif"Teacher."inkey}# View the keys of model weight parametersprint(s_params.keys())# Savepaddle.save(s_params,"./teacher.pdparams")
# Single card trainingpython3tools/train.py-cconfigs/det/PP-OCRv3/PP-OCRv3_det_dml.yml\-oArchitecture.Models.Student.pretrained=./teacher\Architecture.Models.Student2.pretrained=./teacher\Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, please use the following command:python3-mpaddle.distributed.launch--gpus'0,1,2,3'tools/train.py-cconfigs/det/PP-OCRv3/PP-OCRv3_det_dml.yml\-oArchitecture.Models.Student.pretrained=./teacher\Architecture.Models.Student2.pretrained=./teacher\Global.save_model_dir=./output/