Skip to content

gsvic/Cosine-Similarity-with-MapReduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSMR

Cosine Similarity with MapReduce

Description

This repository illustrates the implementation of CSMR algorithm. The paper illustrating CSMR algorithm has been published with title "CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and MapReduce" in the Artificial Intelligence Applications and Innovations 2014 (AIAI 2014) conference.

Paper

Link: http://link.springer.com/chapter/10.1007%2F978-3-662-44722-2_23

Instructions

  • Install Mahout 0.9 version and Hadoop 1.2.1 stable version
  • Go to the CSMR directory: cd Cosine-Similarity-with-MapReduce
  • Build CSMR: mvn install
  • Add your input folder (name it 'input') with the documents in raw format in Cosine-Similarity-with-MapReduce/bin
  • Run CSMR: ./run-csmr.sh
  • See the results: cat OUTPUT_FOLDER/Results/part-r-00000

Related Links

About

Cosine Similarity with MapReduce

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published