Skip to content

jonathanatuscpsu/DM-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Mining in Python

This set of online learning materials for undergraduate and graduate data mining class is currently maintained by Zhaohu (Jonathan) Fan. Some of the materials are from Dr. Yan Yu’s class notes. Thanks for the contribution from previous Ph.D. students at Lindner College of Business.

Contributors:

  • Zhaohu(Jonathan) Fan, PhD in Business Analytics, [email protected]
  • Harsh Singal, M.S. in Business Analytics (current position: Data Scientist - Product Analytics at Asurion)
  • Saidat Sanni, PhD Candidate in Business Analytics.

Lecture and Lab Notes

Introduction to Data Mining and Python

Description
1.A Introduction to Data Mining
1.B Introduction to Python
1.C Advanced techniques: function and loop
1.D Introduction to Markdown (optional)

Exploratory Data Analysis

Description
2.A Explore and describe dataset
2.B Exploratory data analysis by visualization

Linear Regression, Prediction and Variables Seleciton

Description
3.A Linear regression and prediction
3.B Subset variable selection
3.C LASSO variable selection
3.D Monte Carlo simulation

Logistic Regression

Description
4.A Logistic regression and prediction
4.B Logistic regression and variable selection
4.C Logistic Regression for binary classification
4.D Logistic regression and ROC

Cross Validation

Description
5.A Cross validation
5.B Cross validation (Logit model)

Tree Models

Description
6.A Regression Trees
6.B Classification Trees

Advanced Tree Models: Bagging, Random Forests, and Boosting Tree

Description
7.A Bagging trees
7.B Random forests
7.C Boosting trees

Nonlinearity, Generalized Additive Models (GAM), and Nonparametric Smoothing

Description
8.A Univariate Nonparametric Smoothing
8.B Generalized additive model (GAM)

Neural Network, LDA, and SVM

Description
9.A Neural network models
9.B Neural network models (Handwritten Digits Case)
9.C Discriminant analysis (Optional)
9.D Support vector machine (SVM) (Optional)

Unsupervised Learning: Clustering

Description
10.A Clustering

Unsupervised Learning: Association Rules

Description
11.A Association Rules

Other Topics 1: Basic Text Mining

Description
12.A Basic Text Mining

Acknowledgments: I have drawn ideas or readings from the following texts:

About

DM-Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages