My-NLP-Projects

1.Text Cleaning

In text_cleaning.ipynb we will be learning how to prepare document before applying model or how to clean document and in last we will have document converted into tokenized words with their frequency which we will be using in word2vec embedding(representation of words) for our models. For more information just have a look at my sentiment_prediction_model_trained_on_imdb_reviews,in that in starting representation of sentence is basically frequency of those words in that data.

Similarity between words

First install spacy package and trained model then run this code. $ sudo python2 install spacy $ sudo python2 -m spacy download en_core_web_md

2.Similaritybetween sentences

In similarity_sentence_document.ipynb we first gonna split document into sentences and then we gonna convert document into matrix of count of tokens means each row represents sentence and each column represent word and value will be count of that word in that sentence,here total no of columns will be totall no of words in document after cleaning.Then we gonna use cosine similarity to find similarity matrix between sentences by dotproduct between their rows in count matrix.

Difference betwwen matrix we used here and in imdb sentiment project

In imdb model we first found out words and their frequency and alloted them no as how frequent they are like no 1 is most frquent word.Then we convert each sentence into matrix of shape 1Xn where n is no of words in sentence and value is how frequent that word is in the document and then later in embedding we made matrix with one hot encoding means total no of columns =total no of words all over and value =1 at positions where columns = values of how frequent those words(present in document) are.

3.TextRank approach for text summarization (Graph base approach)

https://wordpress.com/post/datasciencebasicsblog.wordpress.com/1034

4.Feature Base approach for text summarization

Give this approach a look at https://datasciencebasicsblog.wordpress.com/2018/06/02/text-summarization-approaches/ Reference for implementation : https://github.com/xiaoxu193/PyTeaser

5.Topic Modeling by LDA approach

Code implementation : LDA_Topic_Modeling.ipynb

See here https://datasciencebasicsblog.wordpress.com/topic-modeling-with-python/

6.Language Model using RNN

To know about language model and RNN and its implementation steps go to :
https://datasciencebasicsblog.wordpress.com/2018/03/03/nlp-recurrent-neural-networks-and-language-models/ https://datasciencebasicsblog.wordpress.com/2018/08/20/making-a-language-model-using-python/

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LDA_Topic_Modeling.ipynb		LDA_Topic_Modeling.ipynb
Language-Model-LSTM.py		Language-Model-LSTM.py
README.md		README.md
Similarity_sentences_document.ipynb		Similarity_sentences_document.ipynb
Text_Cleaning.ipynb		Text_Cleaning.ipynb
Word_Similarity.ipynb		Word_Similarity.ipynb
feature_base_textsummarization.ipynb		feature_base_textsummarization.ipynb
replace_contraction.py		replace_contraction.py
rhyme.txt		rhyme.txt
textrank_text_summarization.ipynb		textrank_text_summarization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My-NLP-Projects

1.Text Cleaning

Similarity between words

2.Similaritybetween sentences

Difference betwwen matrix we used here and in imdb sentiment project

3.TextRank approach for text summarization (Graph base approach)

4.Feature Base approach for text summarization

5.Topic Modeling by LDA approach

6.Language Model using RNN

About

Releases

Packages

Languages

aviralgoyal1997/My-NLP-Projects

Folders and files

Latest commit

History

Repository files navigation

My-NLP-Projects

1.Text Cleaning

Similarity between words

2.Similaritybetween sentences

Difference betwwen matrix we used here and in imdb sentiment project

3.TextRank approach for text summarization (Graph base approach)

4.Feature Base approach for text summarization

5.Topic Modeling by LDA approach

6.Language Model using RNN

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages