Document Scanner and OCR

This is a desktop based GUI Document Scanner.
It can scan multiple images at once.
It automatically detects the 'document' part from the image.
It corrects its orientation and perspective.
The GUI will have the feature to manually select the 'document' part from each image, in case there's an error in automatic detection.
It can save the image in multiple modes and resolution.
It can also save all the images as a simgle pdf.
It will also have the feature of optical character recognition.

It resizes a copy of the image for manipulation.
It then grayscales the image.
Then it applies a bilateral filter to it (blurs the image while preserving edges).
It produces a canny image (binary edges only) with the threshold dependent on median of intensity values of image pixels.
It then detects the contours in the image (the curves).
Then it takes the largest contour based on perimeter (As the document part will have the largest perimeter.
Note : The document will also have the largest area but most of the time the contour detected is not a closed curve, because of which its area becomes very small but the perimeter remains large.
Next it takes a convex hull of our selected contour (smallest polygon enclosing the contour).
Then it uses approximatePolyDP to approximate the convex hull as a rectangle.
Thus it obtains the 4 corners of our document.
Then it scales to coordinates according to the original image size.
From those 4 coordinates we identify which one is the tl, tr, br, bl coordinate.
The it calculates the height and width of our document portion.
Then it does a perspective transform of the original image with our rectangle coordinates and produce the transformed image.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Working-Sample-Images		Working-Sample-Images
output		output
test_images		test_images
.gitattributes		.gitattributes
README.md		README.md
main.py		main.py

Provide feedback