Skip to content

Latest commit

 

History

History
18 lines (14 loc) · 631 Bytes

README.md

File metadata and controls

18 lines (14 loc) · 631 Bytes

Machine Learning in Spark

Objective

Predicting the year in million song dataset with machine learning (mllib package) using pyspark

Data

"Year Prediction MSD Dataset" from UCI Machine Learning Repository is used for this project https://archive.ics.uci.edu/ml/datasets/yearpredictionmsd

Data pre-processing

  • Load the dataset and use min max scaling to scale features between 0 and 1
  • Normalize the labels by subtracting min year
  • Split the dataset into train (70%), test (20%), and validation (10%) set

Model

Using Linear Regression