c++推理速度比python慢好几倍，请教下怎么解决。 #13900

RemyHaijie · 2024-09-24T01:08:18Z

🔎 Search before asking

I have searched the PaddleOCR Docs and found no similar bug report.
I have searched the PaddleOCR Issues and found no similar bug report.
I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

同一张图，同一个模型，同一种配置，python只用1秒，c++要用5秒多。其他图片都能够复现运行慢的问题。

🏃‍♂️ Environment (运行环境)

C++用的是最新的paddle引擎的release版本2.8.1，在VS2022下编译的。
python也是2.8.1
Name: paddleocr Version: 2.8.1 Summary: Awesome OCR toolkits based on PaddlePaddle(8.6M ultra-lightweight pre-trained model, support training and deployment among server, mobile, embedded and IoT devices) Home-page: https://github.com/PaddlePaddle/PaddleOCR Author: Author-email: PaddlePaddle [email protected] License: Apache License 2.0 Location: C:\Users\Remy\AppData\Local\Programs\Python\Python312\Lib\site-packages Requires: beautifulsoup4, cython, fire, fonttools, imgaug, lmdb, numpy, opencv-contrib-python, opencv-python, Pillow, pyclipper, python-docx, pyyaml, rapidfuzz, requests, scikit-image, shapely, tqdm

针对于识别模型参数：

C++参数如下：
this 0x000001ecde9646c0 {predictor_=empty use_gpu_=false gpu_id_=0 ...} PaddleOCR::CRNNRecognizer *
predictor_ empty std::shared_ptr<paddle_infer::Predictor>
use_gpu_ false bool
gpu_id_ 0 int
gpu_mem_ 4000 int
cpu_math_library_num_threads_ 10 int
use_mkldnn_ false bool
label_list_ { size=6625 } std::vector<std::string,std::allocatorstd::string>
mean_ { size=3 } std::vector<float,std::allocator>
scale_ { size=3 } std::vector<float,std::allocator>
is_scale_ true bool
use_tensorrt_ false bool
precision_ "fp16" std::string
rec_batch_num_ 6 int
rec_img_h_ 48 int
rec_img_w_ 320 int
rec_image_shape_ { size=3 } std::vector<int,std::allocator>
resize_op_ {...} PaddleOCR::CrnnResizeImg
normalize_op_ {...} PaddleOCR::Normalize
permute_op_ {...} PaddleOCR::PermuteBatch
cpu_math_library_num_threads 10 const int &
gpu_id 0 const int &
gpu_mem 4000 const int &
label_path "D:/Code/gitClone/PaddleOCR-2.8.1/deploy/cpp_infer/build/Release/ppocr_keys_v1.txt" const std::string &
model_dir "D:\softwarePake\ch_PP-OCRv3_rec_infer" const std::string &
precision "fp16" const std::string &
rec_batch_num 6 const int &
rec_image_shape { size=3 } std::vector<int,std::allocator>
rec_img_h 48 const int &
rec_img_w 320 const int &
use_gpu false const bool &
use_mkldnn false const bool &
use_tensorrt false const bool &

python参数：
Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, use_mlu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='C:\Users\44684/.paddleocr/whl\det\ch\ch_PP-OCRv4_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='C:\Users\44684/.paddleocr/whl\rec\ch\ch_PP-OCRv4_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='c:\Users\44684\AppData\Local\Programs\Python\Python312\Lib\site-packages\paddleocr\ppocr\utils\ppocr_keys_v1.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=True, cls_model_dir='C:\Users\44684/.paddleocr/whl\cls\ch_ppocr_mobile_v2.0_cls_infer', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, sr_model_dir=None, sr_image_shape='3, 32, 128', sr_batch_num=1, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=False, return_word_box=False, output='./output', table_max_len=488, table_algorithm='TableAttn', table_model_dir=None, merge_no_span_structure=True, table_char_dict_path=None, layout_model_dir=None, layout_dict_path=None, layout_score_threshold=0.5, layout_nms_threshold=0.5, kie_algorithm='LayoutXLM', ser_model_dir=None, re_model_dir=None, use_visual_backbone=True, ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ocr_order_method=None, mode='structure', image_orientation=False, layout=True, table=True, ocr=True, recovery=False, use_pdf2docx_api=False, invert=False, binarize=False, alphacolor=(255, 255, 255), lang='ch', det=True, rec=True, type='ocr', savefile=False, ocr_version='PP-OCRv4', structure_version='PP-StructureV2')

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

debug最后发现是rec模型最慢。
深入到源码层去看过，python和C++的没啥差异，唯独在运行推理的环节速度有明显差异。对应的C++是ocr_rec.cpp的
this->predictor_->Run() 这行。

kwdchol · 2024-09-25T08:23:32Z

我也提过这个但是没后续 #10880

XiaoDongGuoGuo · 2024-10-07T08:37:46Z

连续跑两次看看，第二次应该不会慢了

RemyHaijie · 2024-10-07T09:43:05Z

连续跑两次看看，第二次应该不会慢了

不是模型初始化的问题，慢的位置就在推理引擎运行推理的那一行

RemyHaijie · 2024-10-07T09:44:25Z

@dyning 大佬帮看看呢

XiaoDongGuoGuo · 2024-10-07T09:50:05Z

怀疑是不是冷启动的原因，所以先连续跑两次推理看看

kwdchol · 2024-10-08T02:48:17Z

不太懂是不是冷启动的问题但是我测试的时候是用的一组图片不是一张图片

XiaoDongGuoGuo · 2024-10-08T03:01:33Z

所以是每一张都很慢吗？留2张相同的图片跑下是什么结果

kwdchol · 2024-10-08T03:20:45Z

现在没mac m2的环境了我记的当时测是测了好多次每次耗时基本差不过，没有比较单个图片
你的意思是一组里图片都一样来进行测试？

XiaoDongGuoGuo · 2024-10-08T03:39:22Z

是，1组就行，包含2张相同的图，跑一次看看两张图的耗时，另外cpu下环境也可以run的

RemyHaijie · 2024-10-10T02:23:28Z

怀疑是不是冷启动的原因，所以先连续跑两次推理看看

det time:0
cls time:0
rec time:12
ocr proccess succeed
return result:
dll ocr time: 13.167973041534424
det time:0
cls time:0
rec time:12
ocr proccess succeed
return result:
dll ocr time: 13.659173965454102

一样的结果

GreatV · 2024-10-10T02:28:38Z

更换paddle inference，或者用onnx，openvino之类跑，也可以试试paddlex

RemyHaijie · 2024-10-10T02:54:56Z

更换paddle inference，或者用onnx，openvino之类跑，也可以试试paddlex

看了下paddleX，好像是全py的，包括inference部分也是。是不是可以直接使用nuitka打包？

GreatV · 2024-10-10T03:05:57Z

@RemyHaijie paddlex的问题建议在paddlex仓库下提问，参考paddlex部署相关pipeline

RemyHaijie · 2024-10-10T03:23:24Z

更换paddle inference，或者用onnx，openvino之类跑，也可以试试paddlex

如果要更换paddle inference的话，需要换成哪个？希望是在cpu上跑，目前C++的用的是mlk的版本。

XiaoDongGuoGuo · 2024-10-10T03:29:24Z

是都可以的在CPU上跑的，即使你用的MKL版本

XiaoDongGuoGuo · 2024-10-10T03:33:46Z

怀疑是不是冷启动的原因，所以先连续跑两次推理看看

det time:0 cls time:0 rec time:12 ocr proccess succeed return result: dll ocr time: 13.167973041534424 det time:0 cls time:0 rec time:12 ocr proccess succeed return result: dll ocr time: 13.659173965454102

一样的结果

没明白目的是什么，CPU下没必要追究耗时，上面的问题：paddle inference 切换到 tag/v2.7.0

RemyHaijie · 2024-10-10T04:39:23Z

@XiaoDongGuoGuo paddle inference 的2.7.0我没找到。可以给一下链接吗？另外这里追求的主要问题是，为什么python版本下比C++版本下速度快的问题，这里的耗时瓶颈不在于硬件。

XiaoDongGuoGuo · 2024-10-10T07:48:23Z

sorry,打错了paddle inference 用 v2.6 https://www.paddlepaddle.org.cn/inference/v2.6/guides/install/download_lib.html#windows
PaddleOCR 用tags v2.7.0 https://github.com/PaddlePaddle/PaddleOCR/tree/v2.7.0

jingsongliujing added the bug Something isn't working label Sep 26, 2024

GreatV assigned dyning Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

c++推理速度比python慢好几倍，请教下怎么解决。 #13900

c++推理速度比python慢好几倍，请教下怎么解决。 #13900

RemyHaijie commented Sep 24, 2024

kwdchol commented Sep 25, 2024

XiaoDongGuoGuo commented Oct 7, 2024

RemyHaijie commented Oct 7, 2024

RemyHaijie commented Oct 7, 2024

XiaoDongGuoGuo commented Oct 7, 2024

kwdchol commented Oct 8, 2024

XiaoDongGuoGuo commented Oct 8, 2024

kwdchol commented Oct 8, 2024

XiaoDongGuoGuo commented Oct 8, 2024

RemyHaijie commented Oct 10, 2024

GreatV commented Oct 10, 2024

RemyHaijie commented Oct 10, 2024

GreatV commented Oct 10, 2024

RemyHaijie commented Oct 10, 2024

XiaoDongGuoGuo commented Oct 10, 2024

XiaoDongGuoGuo commented Oct 10, 2024

RemyHaijie commented Oct 10, 2024

XiaoDongGuoGuo commented Oct 10, 2024

c++推理速度比python慢好几倍，请教下怎么解决。 #13900

c++推理速度比python慢好几倍，请教下怎么解决。 #13900

Comments

RemyHaijie commented Sep 24, 2024

🔎 Search before asking

🐛 Bug (问题描述)

🏃‍♂️ Environment (运行环境)

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

kwdchol commented Sep 25, 2024

XiaoDongGuoGuo commented Oct 7, 2024

RemyHaijie commented Oct 7, 2024

RemyHaijie commented Oct 7, 2024

XiaoDongGuoGuo commented Oct 7, 2024

kwdchol commented Oct 8, 2024

XiaoDongGuoGuo commented Oct 8, 2024

kwdchol commented Oct 8, 2024

XiaoDongGuoGuo commented Oct 8, 2024

RemyHaijie commented Oct 10, 2024

GreatV commented Oct 10, 2024

RemyHaijie commented Oct 10, 2024

GreatV commented Oct 10, 2024

RemyHaijie commented Oct 10, 2024

XiaoDongGuoGuo commented Oct 10, 2024

XiaoDongGuoGuo commented Oct 10, 2024

RemyHaijie commented Oct 10, 2024

XiaoDongGuoGuo commented Oct 10, 2024