This repository contains all resources and code used in the Coding Club ML sessions. This will be updated periodically. The resources for each meet can be found in their respective folders.
1. Intro to ML - Roadmap and Scope (19/09/2021)
2. Building an ML pipeline from scratch (03/10/2021)
3. Freshers Meet + Recap of first 2 meets (24/12/2021)
4. Building a Sudoku Solver (1/2): Simple Image Classifier (12/01/2022)
5. Building a Sudoku Solver (2/2): Deep Neural Networks (24/01/2022)
6. Deploying ML Models (27/03/2022)
The following roadmap outlines how to approach the coding side of ML systematically, focussing on core concepts from scratch first before moving on to using libraries. The resources for the steps below and courses can be found here.
Finishing this in a few months should give a strong fundamental understanding of ML, before moving on to deep learning and other complex problems like recommender systems, dealing with images, NLP etc. This focuses on supervised regression and classification.
- Linear algebra, Statistics, Probability and Coordinate Geometry (links in resources section) - having a grasp over Gaussian distribution, matrix transformations and an understanding of basic geometric curves helps a lot in the long run. 12th grade calculus should do for differentiation.
- Set up jupyter notebook
- Using NumPy to do basic matrix and vector operations - good starting point
- Understand the basic components of an ML pipeline (analysing data, data cleaning, splitting, scaling and encoding, model training, analysing performance metrics) - each component is dealt in depth later (a good infographic)
- Generate simple straight line datapoints from an equation with NumPy
- Learn what train and validation data mean and how the dataset is split when bulding models in ML
- Understand how linear regression can model this type of data
- Understand what loss functions are why they are used in ML
- Implement gradient descent from scratch with NumPy
- Code a simple training loop to train the model with gradient descent to optimise the loss
- Modify linear regression to work as logistic regression for classification
- Detecting overfitting, the bias-variance tradeoff and adding regularisation
- Learn Naive-Bayes (if possible, implement it with binary training data)
- Understanding the Pandas library and how to move around and operate on large datasets - A Quick Introduction
- Load a real world dataset (eg: Graduate Admissions, iris) with Pandas
- Understand the difference between coninuous and categorical data
- Learn the reasoning behind data normalisation and different categorical encoding methods
- Start off with the scikit-learn library (docs are a good place to start) to split, scale data and train models on this dataset
- Cleaning datasets, analysing correlations, heatmaps (using matplotlib) etc. - very extensive subject (top-rated Kaggle notebooks explain these well)
- Understand how features should be selected and engineered in ML
- Analysing models: Understanding metrics to use for classification and regression, tuning hyperparameters, learning curves
- Setup a simple pipeline that integrates all of the above steps into a single notebook
- Explore Kaggle for datasets and notebooks. Try approaching a more challenging dataset, read through top-rated notebooks and user discussions, replicate their ideas for your ML problem.
An extensive roadmap with resources can be found here.
For starting out in ML, basic background in linear algebra and statistics (probability distribution, Bayes' rule etc.) should suffice.
- Linear algebra playlist by 3Blue1Brown
- A linear algebra course by Prof. Gilbert Strang - More extensive
- StatQuest with Josh Stramer - A great channel for statistics fundamentals (and ML)
- Machine Learning (Stanford University) - The most popular ML course available online.
- awesome-machine-learning - A repo of the best ML courses in different domains.
- Neural networks and deep learning - Best resource for understanding deep learning fundamentals from scratch, with code.
- Neural networks playlist by 3Blue1Brown - Visualising how neural networks work and learn from data.
Tentatively, the upcoming sessions will focus on teaching core ML concepts, building simple ML models, understanding how to analyse and improve their performance, why deep learning is needed, the current state of ML research and how ML models are deployed.