Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c++推理速度比python慢好几倍,请教下怎么解决。 #13900

Open
3 tasks done
RemyHaijie opened this issue Sep 24, 2024 · 18 comments
Open
3 tasks done

c++推理速度比python慢好几倍,请教下怎么解决。 #13900

RemyHaijie opened this issue Sep 24, 2024 · 18 comments
Assignees
Labels
bug Something isn't working

Comments

@RemyHaijie
Copy link

🔎 Search before asking

  • I have searched the PaddleOCR Docs and found no similar bug report.
  • I have searched the PaddleOCR Issues and found no similar bug report.
  • I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

同一张图,同一个模型,同一种配置,python只用1秒,c++要用5秒多。 其他图片都能够复现运行慢的问题。

🏃‍♂️ Environment (运行环境)

C++用的是最新的paddle引擎的release版本2.8.1,在VS2022下编译的。
python也是2.8.1
Name: paddleocr Version: 2.8.1 Summary: Awesome OCR toolkits based on PaddlePaddle(8.6M ultra-lightweight pre-trained model, support training and deployment among server, mobile, embedded and IoT devices) Home-page: https://github.com/PaddlePaddle/PaddleOCR Author: Author-email: PaddlePaddle [email protected] License: Apache License 2.0 Location: C:\Users\Remy\AppData\Local\Programs\Python\Python312\Lib\site-packages Requires: beautifulsoup4, cython, fire, fonttools, imgaug, lmdb, numpy, opencv-contrib-python, opencv-python, Pillow, pyclipper, python-docx, pyyaml, rapidfuzz, requests, scikit-image, shapely, tqdm

针对于识别模型参数:

C++参数如下:
this 0x000001ecde9646c0 {predictor_=empty use_gpu_=false gpu_id_=0 ...} PaddleOCR::CRNNRecognizer *
predictor_ empty std::shared_ptr<paddle_infer::Predictor>
use_gpu_ false bool
gpu_id_ 0 int
gpu_mem_ 4000 int
cpu_math_library_num_threads_ 10 int
use_mkldnn_ false bool
label_list_ { size=6625 } std::vector<std::string,std::allocatorstd::string>
mean_ { size=3 } std::vector<float,std::allocator>
scale_ { size=3 } std::vector<float,std::allocator>
is_scale_ true bool
use_tensorrt_ false bool
precision_ "fp16" std::string
rec_batch_num_ 6 int
rec_img_h_ 48 int
rec_img_w_ 320 int
rec_image_shape_ { size=3 } std::vector<int,std::allocator>
resize_op_ {...} PaddleOCR::CrnnResizeImg
normalize_op_ {...} PaddleOCR::Normalize
permute_op_ {...} PaddleOCR::PermuteBatch
cpu_math_library_num_threads 10 const int &
gpu_id 0 const int &
gpu_mem 4000 const int &
label_path "D:/Code/gitClone/PaddleOCR-2.8.1/deploy/cpp_infer/build/Release/ppocr_keys_v1.txt" const std::string &
model_dir "D:\softwarePake\ch_PP-OCRv3_rec_infer" const std::string &
precision "fp16" const std::string &
rec_batch_num 6 const int &
rec_image_shape { size=3 } std::vector<int,std::allocator>
rec_img_h 48 const int &
rec_img_w 320 const int &
use_gpu false const bool &
use_mkldnn false const bool &
use_tensorrt false const bool &

python参数:
Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, use_mlu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='C:\Users\44684/.paddleocr/whl\det\ch\ch_PP-OCRv4_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='C:\Users\44684/.paddleocr/whl\rec\ch\ch_PP-OCRv4_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='c:\Users\44684\AppData\Local\Programs\Python\Python312\Lib\site-packages\paddleocr\ppocr\utils\ppocr_keys_v1.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=True, cls_model_dir='C:\Users\44684/.paddleocr/whl\cls\ch_ppocr_mobile_v2.0_cls_infer', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, sr_model_dir=None, sr_image_shape='3, 32, 128', sr_batch_num=1, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=False, return_word_box=False, output='./output', table_max_len=488, table_algorithm='TableAttn', table_model_dir=None, merge_no_span_structure=True, table_char_dict_path=None, layout_model_dir=None, layout_dict_path=None, layout_score_threshold=0.5, layout_nms_threshold=0.5, kie_algorithm='LayoutXLM', ser_model_dir=None, re_model_dir=None, use_visual_backbone=True, ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ocr_order_method=None, mode='structure', image_orientation=False, layout=True, table=True, ocr=True, recovery=False, use_pdf2docx_api=False, invert=False, binarize=False, alphacolor=(255, 255, 255), lang='ch', det=True, rec=True, type='ocr', savefile=False, ocr_version='PP-OCRv4', structure_version='PP-StructureV2')

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

debug最后发现是rec模型最慢。
深入到源码层去看过,python和C++的没啥差异,唯独在运行推理的环节速度有明显差异。对应的C++是ocr_rec.cpp的
this->predictor_->Run() 这行。

@kwdchol
Copy link

kwdchol commented Sep 25, 2024

我也提过这个 但是没后续 #10880

@jingsongliujing jingsongliujing added the bug Something isn't working label Sep 26, 2024
@XiaoDongGuoGuo
Copy link

连续跑两次看看,第二次应该不会慢了

@RemyHaijie
Copy link
Author

连续跑两次看看,第二次应该不会慢了

不是模型初始化的问题,慢的位置就在推理引擎运行推理的那一行

@RemyHaijie
Copy link
Author

@dyning 大佬帮看看呢

@XiaoDongGuoGuo
Copy link

怀疑是不是冷启动的原因,所以先连续跑两次推理看看

@kwdchol
Copy link

kwdchol commented Oct 8, 2024

不太懂是不是冷启动的问题 但是我测试的时候是用的一组图片 不是一张图片

@XiaoDongGuoGuo
Copy link

所以是每一张都很慢吗?留2张相同的图片跑下是什么结果

@kwdchol
Copy link

kwdchol commented Oct 8, 2024

现在没mac m2的环境了 我记的当时测是测了好多次 每次耗时基本差不过,没有比较单个图片
你的意思是一组里 图片都一样来进行测试?

@XiaoDongGuoGuo
Copy link

是,1组就行,包含2张相同的图,跑一次看看两张图的耗时,另外cpu下环境也可以run的

@RemyHaijie
Copy link
Author

怀疑是不是冷启动的原因,所以先连续跑两次推理看看
image

det time:0
cls time:0
rec time:12
ocr proccess succeed
return result:
dll ocr time: 13.167973041534424
det time:0
cls time:0
rec time:12
ocr proccess succeed
return result:
dll ocr time: 13.659173965454102

一样的结果

@GreatV
Copy link
Collaborator

GreatV commented Oct 10, 2024

更换paddle inference,或者用onnx,openvino之类跑,也可以试试paddlex

@RemyHaijie
Copy link
Author

更换paddle inference,或者用onnx,openvino之类跑,也可以试试paddlex

看了下paddleX,好像是全py的,包括inference部分也是。 是不是可以直接使用nuitka打包?

@GreatV
Copy link
Collaborator

GreatV commented Oct 10, 2024

@RemyHaijie paddlex的问题建议在paddlex仓库下提问,参考paddlex部署相关pipeline

@RemyHaijie
Copy link
Author

更换paddle inference,或者用onnx,openvino之类跑,也可以试试paddlex

如果要更换paddle inference的话,需要换成哪个?希望是在cpu上跑,目前C++的用的是mlk的版本。

@XiaoDongGuoGuo
Copy link

是都可以的在CPU上跑的,即使你用的MKL版本
image

@XiaoDongGuoGuo
Copy link

怀疑是不是冷启动的原因,所以先连续跑两次推理看看
image

det time:0 cls time:0 rec time:12 ocr proccess succeed return result: dll ocr time: 13.167973041534424 det time:0 cls time:0 rec time:12 ocr proccess succeed return result: dll ocr time: 13.659173965454102

一样的结果

没明白目的是什么,CPU下没必要追究耗时,上面的问题:paddle inference 切换到 tag/v2.7.0

@RemyHaijie
Copy link
Author

@XiaoDongGuoGuo paddle inference 的2.7.0我没找到。 可以给一下链接吗? 另外这里追求的主要问题是,为什么python版本下比C++版本下速度快的问题,这里的耗时瓶颈不在于硬件。

@XiaoDongGuoGuo
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants