Skip to content

This Repository contains implementation of FOTS research paper in Tensor flow version 2

Notifications You must be signed in to change notification settings

keshusharmamrt/FOTS-Scene-Text-Parsing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

FOTS-Scene-Text-Parsing

This Repository contains implementation of FOTS research paper in Tensor flow version 2

Here we have shown Model Training, Model Inference, Model Post Training Quantization and Deployment via StreamLit too. Note := All notebooks given in this repository are from Google Colaboratory.

DataSet used:

  1. ICDAR 2015 Dataset :- This dataset includes 1000 training images and 500 testing images. These images are captured by Google glasses without taking care of position. This Dataset is organized as following directory:-
    a.ch4_training_images:- This directory contain all 1000 images that we have used for training out text detection branch.
    b.ch4_training_localization_transcription_gt:- This directory contains 1000 files of type gt_img_<img_no> which contians coordinate as well as words that are present in that particular train image. Some words here are marked as ### to denote Don't Care Words in that image.
    c.ch4_training_word_images_gt:- This directory contains 4468 word images and 2 files names coords.txt and gt.txt. coords.txt contains coordinate of exact text in images present in this directory and gt.txt contains particular English word for corresponding image in this directory.
    d.ch4_test_images:- This directory contains 500 test images.

  2. SyntText Dataset :- To Avoid overfitting in Text Recognition Branch because of small set of 4468 images in ICDAR 2015 dataset we have also make use of 130000 images from SyntText Dataset Here we have 1 notebook:-

  3. FOTS_Scene_Text_Parsing.ipynb

FOTS_Scene_Text_Parsing.ipynb
This Notebook contains entire Implementation of FOTS Model.

This FOTS Model consists of 3 branches:

  1. Text Detection Branch : - This branch is used to Detect Text regions that are present in our images.In this branch we mainly predict 2 things:

    a. Score Map :- These denotes wheter each pixel in input image is text region or not text region.

    b. Geo Map :- It is a 5 channel data. First 4 channels contains distances to top, bottom, left, right sides of the bounding box that contains text region pixel given by score map and last channel contains orientation of bounding box

  2. ROI Rotate: - This branch is used to convert Score Map and Geo Maps given by Text Detection branch to coordinates of boxes where images are present.

  3. Text Recognition Branch :- This branch is used to convert all Text Regions given by detection and ROI Branch to English words.

Here we have trained both Text Detection and Text Recognition branch seperately and Finally after Training them have have created Final Inference pipeline which include all Text Detection branch, Text Recognition branch and ROI Rotate Branch.

In This Notebook we have also discussed about Several Post Training Quantization that we have done for out Model and how these Quantization techniques impact on model size, latency of model etc.

References For Code:-
[1]. https://github.com/Pay20Y/FOTS_TF
[2]. https://github.com/yu20103983/FOTS
[3]. https://github.com/Masao-Taketani/FOTS_OCR

About

This Repository contains implementation of FOTS research paper in Tensor flow version 2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published