Clustering Analysis of Hobbes's Leviathan

This project applies clustering analysis to the text of Hobbes's Leviathan, which is available on Project Gutenberg. The goal is to identify patterns in the text and group similar sections together.

Data

The data used in this project is the full text of Hobbes's Leviathan, which was downloaded from Project Gutenberg in plain text format.

Methods

The text data was preprocessed by removing stop words and stemming the remaining words. A TF-IDF vectorizer was then applied to convert the text data into a matrix of features.

The optimal number of clusters was determined using the elbow method, and k-means clustering was applied to group the text data into clusters. The top words for each cluster were identified using the centroid of the cluster and the TF-IDF values.

A dendrogram was also generated to visualize the hierarchical distances between clusters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls