Topic Analysis of Review Data

Objectives

To assist a major mobile brand in comprehending the voice of the consumer and the subjects that customers are discussing by examining the product reviews on Amazon
To use machine learning and Python to comprehend customer voice and forecast the review rating based on Amazon's product
To perform topic modeling on specific parts of speech
To interpret the emerging topics using customer voice

Prerequisites

Data exploration: It is used to find trends and patterns or to check assumptions by analyzing data with visual tools.
POS tagging: It is used in text analysis tools and algorithms as well as corpus searches.
Model creation: It uses Python’s backend to create a sequential model.
Python: It is widely used to implement data analysis and machine learning.
Topic modeling: It is used to discover the themes that run through a corpus by analyzing the words of the original texts.
LDA: It is used as a preprocessing step in machine learning and it is mainly used for classification problems.
NLTK: It is a platform used for building Python programs that work with human language data for applications in statistical natural language processing (NLP).

Dataset Description

Variable - Description
Sentiment - The sentiment against the review (4- and 5-star reviews are positive, and 1-, 2-, and 3-star reviews are negative)
Reviews - The main text of the review

Steps:

Discover the topics in the reviews, and present them to the business in a consumable format by utilizing syntactic processing and topic modeling techniques. Perform specific cleanup and POS tagging and add restrictions to relevant POS tags. Then, perform topic modeling using LDA. Finally, give business-friendly names to the topics, and make a table for the business.

Read the .csv file using Pandas, and look at the first few top records
Normalize the casing of the review text, and extract the text into a list for easier manipulation.
Tokenize the reviews using NLTK's word_tokenize function
Perform parts-of-speech tagging on each sentence using the NLTK POS tagger
For the topic model, include nouns only

Find all POS tags that correspond to nouns
Limit the data to terms with these tags

Lemmatize

The different forms of the terms need to be treated as one
For the time being, there is no need to provide a POS tag to the lemmatizer

Remove stopwords and punctuations (if any)
Create a topic model using LDA on the cleaned-up data with 12 topics

Print the top terms for each topic
Find the coherence of the model with the c_v metric

Analyze the topics through the business lens

Determine which of the topics can be combined

Create a topic model using LDA with what you think is the optimal number of topics

Find the coherence of the model

Businesses should be able to interpret the topics.

Name each of the identified topics
Create a table with the topic names and the top ten terms in each to present to the business

Setup and Installation:

pip install --upgrade pip
pip install -r requirements.txt
pip list

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
README.md		README.md
main.ipynb		main.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topic Analysis of Review Data

Objectives

Prerequisites

Dataset Description

Steps:

Setup and Installation:

About

Releases

Packages

Languages

Jigisha-p/Topic-Analysis-of-Review-Data

Folders and files

Latest commit

History

Repository files navigation

Topic Analysis of Review Data

Objectives

Prerequisites

Dataset Description

Steps:

Setup and Installation:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages