Text-Summarizer

A project to implement extractive Text Summarization Using OCR and Attention Networks. In this project, we propose to build a model that performs extractive summarization of a news article with the aid of Optical Character Recognition and Attention Networks. We achieve this by building a model with the algorithms of Recurrent Neural Networks and Bi-directional Long Short Term Memory. We use Bahadanu Attention with the neural architecture to achieve the Attention Network. It’s main objective is to summarize the text from an image of an article and display it’s result.

Getting Started

Basic software requirements-

Python 3
Anaconda for python 3
Tensorflow
Create Tensorflow environment

conda create -n tensorflow_env tensorflow
conda activate tensorflow_env

Install pytesseract
Install the following:

Git
Nodejs
NPM
Bower

Only for first time installation

git clonehttps://github.com/mitali3112/Text-Summarizer.git

Enter the server folder and execute the notebook titled "Text Summarization.ipynb" -Compile and run all the cells -Download the dataset from kaggle from the link given below -Replace the paths in the notebook with relevant paths in your system -Save the trained model, embeddings and tensorboard paths in your system. -Train the model and complete executing all the cells -Save the word vectors in pickle format in the server folder. -Change the path for the trained model in the file testprocess.py
Enter that folder app/ -bower install -npm install -node run_app.js Setup done
Keep relevant images or text ready for running the file.

Running the project

In different terminal tabs (All actvated under tensorflow environment created)

Got to app/

node run_app.js

Go to server/

python3 server.py

3.Now go to http://localhost:8000/ and the frontend is there.

To launch the tensorboard

tensorboard --logdir='full path to tensorboard savepath'

Dataset

Download the dataset from the link All the News Dataset

Times of India dataset is present in the repository as toi.csv

Built With

Recurrent Neural Networks
Long Short Term Memory
Bahadanu Attention
Adam Optimizer
Tesseract OCR

Authors

Mitali Sheth - Mitali Sheth

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
server		server
README.md		README.md
toi.csv		toi.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Summarizer

Getting Started

Running the project

Dataset

Built With

Authors

About

Releases

Packages

Languages

mitali3112/Text-Summarizer

Folders and files

Latest commit

History

Repository files navigation

Text-Summarizer

Getting Started

Running the project

Dataset

Built With

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages