In this project the optical character recognition (OCR) algorithms distributively applied on images via AWS instances. Each instance will process part of images out of the total batch.
By the end of the process, each instance saves the processed images within shared html file.
The project is composed of three main parts, local application(Local.java)
, manager (MrManager.java)
and workers (Worker.java)
.
This application starts working on your local machine (not AWS cloud) as follows:
- Uploads
links.txt
file to S3 storage which contains links to images. - Send a message that stating the location of
links.txt
in S3 bucket to an SQS queue. - Initializing the manager.
- Wait for the manager and the workers to finish the OCR process by checking specific SQS queue.
- Download
MrManagerOutput.html
file from S3, the file contains images with ORC output.
Unlike the local application, the manager resides on EC2 node. It's main purpose is to initialize the workers and controling their work via SQS messages and S3 buckets:
- Receive message from local application via SQS.
- Download
links.txt
from S3 bucket. - Send each link from
links.txt
file to SQS queue. - Create worker for every n messages.
- Wait for workers to finish their job, terminate the workers when theyr done.
- Read all messages from the results queue, create and upload
MrManagerOutput.html
to S3 bucket. - Send message to local application via SQS queue that indicates the end of OCR process.
Similarly to the manager, this process resides on EC2 node as follows:
- Obtain link to an image via SQS queue and download the image.
- Initiate OCR algorithm on the image.
- Notify the manager of the text exctracted from the image.
- Remove the image from SQS queue.
- Repeat the process.
- In order to use AWS cloud register here
- Download and install Java SDK
- Create a project with provided
.java
files from this repository. - Export each
.java
file as.jar
. - Create S3 bucket named "shleem" and upload
MrManager.jar
&Worker.jar
to this bucket. java -jar yourjar.jar inputFileName n
for example:java -jar Local.jar links.txt 10