This project explores the relationship between housing prices and various features of houses. The project uses the housing.csv
dataset, which contains information about houses in a particular city.
The project consists of two Jupyter notebooks:
Without Preprocessing.ipynb
: fits a linear regression model to the raw data without any preprocessingWith Preprocessing.ipynb
: preprocesses the data and then fits a linear regression model
The main objective of this project is to show the importance of preprocessing data before fitting a model. The second notebook demonstrates how to effectively preprocess data to improve the accuracy of the model.
Provide instructions on how to install this project, including any dependencies that need to be installed first. For example:
1. Clone the repository: `git clone https://github.com/thesahibnanda/How-To-Preprocess-Dataset-And-Its-Importance`
2. Install dependencies: `pip install -r requirements.txt`
To run the notebooks, simply open them in Jupyter or Google Colab and run each cell in order. The notebooks include detailed explanations of each step, as well as visualizations of the data.
Note that the housing.csv
file should be located in the same directory as the notebooks.
The evaluation metrics used in this project are mean squared error (MSE), root mean squared error (RMSE), R-squared (R2), Adjusted R-squared (Adj. R2) and Sum of Square of Residuals (SSR). The MSE, RMSE and SSR are used to evaluate the accuracy of the model, while the R2 and Adj. R2 is used to measure the goodness of fit.
This project demonstrates the importance of preprocessing data before fitting a model, and shows how to effectively preprocess data to improve the accuracy of the model. The evaluation metrics used in this project provide a quantitative measure of the model's performance, and can be used to compare different models or preprocessing techniques.