How To Preprocess Dataset And Its Importance

This project explores the relationship between housing prices and various features of houses. The project uses the housing.csv dataset, which contains information about houses in a particular city.

The project consists of two Jupyter notebooks:

Without Preprocessing.ipynb: fits a linear regression model to the raw data without any preprocessing
With Preprocessing.ipynb: preprocesses the data and then fits a linear regression model

The main objective of this project is to show the importance of preprocessing data before fitting a model. The second notebook demonstrates how to effectively preprocess data to improve the accuracy of the model.

Installation

Provide instructions on how to install this project, including any dependencies that need to be installed first. For example:

1. Clone the repository: `git clone https://github.com/thesahibnanda/How-To-Preprocess-Dataset-And-Its-Importance`
2. Install dependencies: `pip install -r requirements.txt`

Usage

To run the notebooks, simply open them in Jupyter or Google Colab and run each cell in order. The notebooks include detailed explanations of each step, as well as visualizations of the data.

Note that the housing.csv file should be located in the same directory as the notebooks.

Evaluation Metrics

The evaluation metrics used in this project are mean squared error (MSE), root mean squared error (RMSE), R-squared (R2), Adjusted R-squared (Adj. R2) and Sum of Square of Residuals (SSR). The MSE, RMSE and SSR are used to evaluate the accuracy of the model, while the R2 and Adj. R2 is used to measure the goodness of fit.

Conclusion

This project demonstrates the importance of preprocessing data before fitting a model, and shows how to effectively preprocess data to improve the accuracy of the model. The evaluation metrics used in this project provide a quantitative measure of the model's performance, and can be used to compare different models or preprocessing techniques.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
With Preprocessing.ipynb		With Preprocessing.ipynb
Without Preprocessing.ipynb		Without Preprocessing.ipynb
housing.csv		housing.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How To Preprocess Dataset And Its Importance

Table of Contents

Installation

Usage

Evaluation Metrics

Conclusion

About

Releases

Packages

Languages

thesahibnanda/How-To-Preprocess-Dataset-And-Its-Importance

Folders and files

Latest commit

History

Repository files navigation

How To Preprocess Dataset And Its Importance

Table of Contents

Installation

Usage

Evaluation Metrics

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages