Skip to content

Latest commit

 

History

History
109 lines (74 loc) · 5.57 KB

README_en.md

File metadata and controls

109 lines (74 loc) · 5.57 KB

English | 简体中文

Introduction

Converting PaddleOCR to PyTorch.

This repository aims to

  • learn PaddleOCR
  • use models in PyTorch which are trained in Paddle
  • give a guideline for Paddle2PyTorch

TODO

  • 3 text recognition algorithms (NRTR、SEED、SAR), 1 key information extraction algorithm (SDMGR) and 3 DocVQA algorithms (LayoutLM, LayoutLMv2, LayoutXLM)
  • a new structured documents analysis toolkit, i.e., PP-Structure, support layout analysis and table recognition (One-key to export chart images to Excel files).

Notice

PytorchOCR models are converted from PaddleOCRv2.0.

Recent updates

  • 2022.03.20 1 text detection algorithm (PSENet)
  • 2021.09.11 PP-OCRv2. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
  • 2021.06.01 update SRN
  • 2021.04.25 update AAAI 2021 end-to-end algorithm PGNet
  • 2021.04.24 update RARE
  • 2021.04.12 update STARNET
  • 2021.04.08 update DB, SAST, EAST, ROSETTA, CRNN
  • 2021.04.03 update more than 25+ multilingual recognition models models list, including:English, Chinese, German, French, Japanese,Spanish,Portuguese Russia Arabic and so on. Models for more languages will continue to be updated Develop Plan.
  • 2021.01.10 upload Chinese and English general OCR models.

Features

  • PTOCR series of high-quality pre-trained models, comparable to commercial effects
    • Ultra lightweight PP-OCRv2 series models: detection (3.1M) + direction classifier (1.4M) + recognition 8.5M) = 13.0M
    • Ultra lightweight ptocr_mobile series models
    • General ptocr_server series models
    • Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
    • Support multi-language recognition: Korean, Japanese, German, French, etc.

Model List (updating)

PyTorch models in BaiduPan:https://pan.baidu.com/s/1r1DELT8BlgxeOP2RqREJEg code:6clx

PaddleOCR models in BaiduPan:https://pan.baidu.com/s/1getAprT2l_JqwhjwML0g9g code:lmv7

If you want to get more models including multilingual models,please refer to PTOCR series.

Tutorials

PP-OCR Pipeline

[1] PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941).

[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2.

Visualization

  • Chinese OCR model
  • English OCR model
  • Multilingual OCR model

References