Skip to content

Analyzing the dataset using MapReduce programming

Notifications You must be signed in to change notification settings

abhi9254/MovieLens

Repository files navigation

MovieLens

MapReduce programs in Java to analyze Movie Ratings data provided by MovieLens.

GroupLens Research has collected and made available rating data sets from the MovieLens web site (http://movielens.org). The data sets were collected over various periods of time, depending on the size of the set.

I have used http://files.grouplens.org/datasets/movielens/ml-latest-small.zip for my purpose.

I performed my analysis on the following files- (a) movies.csv which has movieId,title,genres (b) ratings.csv which has userId,movieId,rating,timestamp

1.To get count of movies in all Genres - hadoop jar /home/cloudera/Desktop/ML_Driver.jar movielens.ML_Driver /user/cloudera/input/ml-latest-small/movies /user/cloudera/output/movielens/genre_wise_counts

2.To get Average rating for each Genre - hadoop jar /home/cloudera/Desktop/ML_Driver.jar movielens.ML_Join_Driver /user/cloudera/input/ml-latest-small/movies /user/cloudera/input/ml-latest-small/ratings /user/cloudera/output/movielens/genre_wise_ratings

Output:

Action 3.0908625 Adventure 3.2298608 Animation 3.418974 Children 3.1386745 Comedy 3.190355 Crime 3.3016186 Documentary 3.6867437 Drama 3.4474185 Fantasy 3.1838272 Film-Noir 3.669072 Horror 2.991933 IMAX 3.0590277 Musical 3.334026 Mystery 3.383181 Romance 3.3445897 Sci-Fi 3.1670532 Thriller 3.1830714 War 3.5340207 Western 3.4210017

About

Analyzing the dataset using MapReduce programming

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages