Skip to content

The primary aim of this project is to emphasize the significance of data preprocessing prior to model fitting. The secondary notebook illustrates efficient methods for data preprocessing, which can enhance the model's accuracy.

Notifications You must be signed in to change notification settings

thesahibnanda/How-To-Preprocess-Dataset-And-Its-Importance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

How To Preprocess Dataset And Its Importance

This project explores the relationship between housing prices and various features of houses. The project uses the housing.csv dataset, which contains information about houses in a particular city.

The project consists of two Jupyter notebooks:

  • Without Preprocessing.ipynb: fits a linear regression model to the raw data without any preprocessing
  • With Preprocessing.ipynb: preprocesses the data and then fits a linear regression model

The main objective of this project is to show the importance of preprocessing data before fitting a model. The second notebook demonstrates how to effectively preprocess data to improve the accuracy of the model.

Table of Contents

Installation

Provide instructions on how to install this project, including any dependencies that need to be installed first. For example:

1. Clone the repository: `git clone https://github.com/thesahibnanda/How-To-Preprocess-Dataset-And-Its-Importance`
2. Install dependencies: `pip install -r requirements.txt`

Usage

To run the notebooks, simply open them in Jupyter or Google Colab and run each cell in order. The notebooks include detailed explanations of each step, as well as visualizations of the data.

Note that the housing.csv file should be located in the same directory as the notebooks.

Evaluation Metrics

The evaluation metrics used in this project are mean squared error (MSE), root mean squared error (RMSE), R-squared (R2), Adjusted R-squared (Adj. R2) and Sum of Square of Residuals (SSR). The MSE, RMSE and SSR are used to evaluate the accuracy of the model, while the R2 and Adj. R2 is used to measure the goodness of fit.

Conclusion

This project demonstrates the importance of preprocessing data before fitting a model, and shows how to effectively preprocess data to improve the accuracy of the model. The evaluation metrics used in this project provide a quantitative measure of the model's performance, and can be used to compare different models or preprocessing techniques.

About

The primary aim of this project is to emphasize the significance of data preprocessing prior to model fitting. The secondary notebook illustrates efficient methods for data preprocessing, which can enhance the model's accuracy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published