Skip to content

Latest commit

 

History

History
15 lines (8 loc) · 911 Bytes

README.md

File metadata and controls

15 lines (8 loc) · 911 Bytes

Clustering Analysis of Hobbes's Leviathan

This project applies clustering analysis to the text of Hobbes's Leviathan, which is available on Project Gutenberg. The goal is to identify patterns in the text and group similar sections together.

Data

The data used in this project is the full text of Hobbes's Leviathan, which was downloaded from Project Gutenberg in plain text format.

Methods

The text data was preprocessed by removing stop words and stemming the remaining words. A TF-IDF vectorizer was then applied to convert the text data into a matrix of features.

The optimal number of clusters was determined using the elbow method, and k-means clustering was applied to group the text data into clusters. The top words for each cluster were identified using the centroid of the cluster and the TF-IDF values.

A dendrogram was also generated to visualize the hierarchical distances between clusters.