This is a Python script for Optical Character Recognition (OCR) of images that are captured by a camera. The script first imports the required libraries, installs the necessary packages, and mounts the Google Drive. It then defines the functions for segmenting, cropping, classifying, object detection, and OCR tasks.
The script starts by importing the following libraries:
tesseract-ocr
andpytesseract
for OCR.PIL
,matplotlib
,cv2
, andnumpy
for image processing.os
andgoogle.colab
to mount the Google Drive.roboflow
for image segmentation.
The segment
function is used for image segmentation, which is done using Roboflow. The function first imports the required libraries, including base64
, cv2
, matplotlib
, and numpy
. It then reads the input image, resizes it, and passes it to the Roboflow model for segmentation. The function then extracts the segmentation mask from the resulting JSON and converts it to a NumPy array. Finally, it applies the mask to the original image and returns the segmented image.
The cropping
function is used to crop the segmented image. The function takes as input the segmented image and the points for the region of interest (ROI). It first imports cv2
and matplotlib
, and then performs perspective transformation to warp the image and extract the ROI. The function returns the cropped image.
The classify
function is used for image classification. The function first imports tensorflow
, numpy
, and keras.utils
. It then loads the pre-trained model for image classification from the Google Drive, preprocesses the input image, and passes it to the model for prediction. The function returns the class label.
The inference_obj
function is used for object detection. The function first imports the required libraries, including glob
, matplotlib
, cv2
, requests
, and numpy
. It then performs object detection on the input images using the pre-trained YOLOv5 model. The function returns the directory where the inference results are stored.
The ocr_task
function is used for OCR. The function takes as input an array of cropped images, reads each image, applies Gaussian blur and sharpening filters, and passes it to the OCR engine for text recognition. The function then returns the OCR result for each image.
The script mounts the Google Drive at the beginning using the drive.mount()
function to store the pre-trained models and other necessary files.