Skip to content

Latest commit

 

History

History
95 lines (82 loc) · 3.77 KB

352-Introduction-to-Machine-Learning.org

File metadata and controls

95 lines (82 loc) · 3.77 KB

<<<CP1352>>> INTRODUCTION TO MACHINE LEARNING

{{{credits}}}

LTPC
2023

Course Objectives

  • To have a basic knowledge of the concepts and tools of machine learning.
  • To learn the data pre-processing methods and be able to apply on dataset
  • To understand the working of supervised and unsupervised algorithms.
  • To learn the evaluation methods and apply them for validation

{{{unit}}}

Unit IIntroduction to Machine Learning9

Why Machine Learning: Problems Machine Learning Can Solve – Task and Data; scikit-learn; Essential Libraries and Tools: Jupyter Notebook – NumPy – SciPy – matplotlib – pandas – mglearn; Classifying Iris Species: Data – Measuring Success – Training and Testing Data – Building Model: k-Nearest Neighbors – Making Predictions – Evaluating the Model.

{{{unit}}}

Unit IIData pre-processing9

Preprocessing and Scaling: Different Kinds of Preprocessing – Applying Data Transformations – Scaling Training and Test Data; Categorical Variables: One-Hot-Encoding – Numbers to Encode Categoricals; Binning, Discretization, Linear Models, and Trees; Interactions and Polynomials; Univariate Nonlinear Transformations; Automatic Feature Selection: Univariate Statistics – Model-Based Feature Selection – Iterative Feature Selection.

{{{unit}}}

Unit IIISupervised Learning9

Classification and Regression; Generalization, Overfitting, and Underfitting; Relation of Model Complexity to Dataset Size; Supervised Machine Learning Algorithms: Sample Datasets – k-Nearest Neighbors – Linear Models – Naive Bayes Classifiers – Decision Trees – Ensembles of Decision Trees – Bagging – Random Forests – Boosting – Neural Networks (Deep Learning).

{{{unit}}}

Unit IVUnsupervised Learning9

Types of Unsupervised Learning; Challenges; Dimensionality Reduction: Principal Component Analysis (PCA); Clustering – k-Means Clustering Agglomerative Clustering – Comparing and Evaluating Clustering Algorithms.

{{{unit}}}

Unit VModel Evaluation and Improvement.9

Cross-Validation: Cross-Validation in scikit-learn – Benefits of Cross-Validation – Stratified k-Fold Cross-Validation and Other Strategies; Grid Search: Simple Grid Search – The Danger of Overfitting the Parameters and the Validation Set – Grid Search with Cross-Validation; Evaluation Metrics and Scoring: Keep the End Goal in Mind – Metrics for Binary Classification – Metrics for Multiclass Classification – Regression Metrics – Using Evaluation Metrics in Model Selection.

Suggestive Experiments (Python - Numpy, Scikit-learn, Matplotlib)

  1. Perceptron and Linear Regression
  2. Multi-layer Perceptron
  3. Support Vector Machine
  4. Decision Tree algorithm
  5. k-Nearest Neighbor algorithm
  6. K-means clustering

\hfill Total: 60

Course Outcomes

After the completion of this course, students will be able to:

  • Understand the basic concepts of machine learning (K2).
  • Apply data preprocessing and feature selection for datasets (K4).
  • Apply supervised and unsupervised techniques for different applications (K4).
  • Evaluate and analyse the models using appropriate metrics (K4).

References

  1. Andreas C. Muller and Sarah Guido, “Introduction to Machine Learning with Python”, O’Reilly Media, 2016
  2. Aurelien Geron, “Hands-On Machine Learning with Scikit-Learn and TensorFlow”, O’Reilly Media, 2016
  3. Sebastian Raschka and Vahid Mirjalili, “Python Machine Learning”, Second Edition, Packt Publishing 2017.
  4. Stephen Marsland, “Machine Learning - An Algorithmic Perspective”, Second Edition, Chapman and Hall/CRC Machine Learning and Pattern Recognition Series, 2014.