Skip to content

Latest commit

 

History

History
73 lines (53 loc) · 5.61 KB

README.md

File metadata and controls

73 lines (53 loc) · 5.61 KB

Machine Learning and AI @ SSN Coding Club

This repository contains all resources and code used in the Coding Club ML sessions. This will be updated periodically. The resources for each meet can be found in their respective folders.

Meets:

1. Intro to ML - Roadmap and Scope (19/09/2021)
2. Building an ML pipeline from scratch (03/10/2021)
3. Freshers Meet + Recap of first 2 meets (24/12/2021)
4. Building a Sudoku Solver (1/2): Simple Image Classifier (12/01/2022)
5. Building a Sudoku Solver (2/2): Deep Neural Networks (24/01/2022)
6. Deploying ML Models (27/03/2022)

Roadmap:

The following roadmap outlines how to approach the coding side of ML systematically, focussing on core concepts from scratch first before moving on to using libraries. The resources for the steps below and courses can be found here.

Finishing this in a few months should give a strong fundamental understanding of ML, before moving on to deep learning and other complex problems like recommender systems, dealing with images, NLP etc. This focuses on supervised regression and classification.

  • Linear algebra, Statistics, Probability and Coordinate Geometry (links in resources section) - having a grasp over Gaussian distribution, matrix transformations and an understanding of basic geometric curves helps a lot in the long run. 12th grade calculus should do for differentiation.
  • Set up jupyter notebook
  • Using NumPy to do basic matrix and vector operations - good starting point
  • Understand the basic components of an ML pipeline (analysing data, data cleaning, splitting, scaling and encoding, model training, analysing performance metrics) - each component is dealt in depth later (a good infographic)
  • Generate simple straight line datapoints from an equation with NumPy
  • Learn what train and validation data mean and how the dataset is split when bulding models in ML
  • Understand how linear regression can model this type of data
  • Understand what loss functions are why they are used in ML
  • Implement gradient descent from scratch with NumPy
  • Code a simple training loop to train the model with gradient descent to optimise the loss
  • Modify linear regression to work as logistic regression for classification
  • Detecting overfitting, the bias-variance tradeoff and adding regularisation
  • Learn Naive-Bayes (if possible, implement it with binary training data)
  • Understanding the Pandas library and how to move around and operate on large datasets - A Quick Introduction
  • Load a real world dataset (eg: Graduate Admissions, iris) with Pandas
  • Understand the difference between coninuous and categorical data
  • Learn the reasoning behind data normalisation and different categorical encoding methods
  • Start off with the scikit-learn library (docs are a good place to start) to split, scale data and train models on this dataset
  • Cleaning datasets, analysing correlations, heatmaps (using matplotlib) etc. - very extensive subject (top-rated Kaggle notebooks explain these well)
  • Understand how features should be selected and engineered in ML
  • Analysing models: Understanding metrics to use for classification and regression, tuning hyperparameters, learning curves
  • Setup a simple pipeline that integrates all of the above steps into a single notebook
  • Explore Kaggle for datasets and notebooks. Try approaching a more challenging dataset, read through top-rated notebooks and user discussions, replicate their ideas for your ML problem.

Resources for learning ML:

An extensive roadmap with resources can be found here.

Math

For starting out in ML, basic background in linear algebra and statistics (probability distribution, Bayes' rule etc.) should suffice.

Lectures

Courses

Miscellaneous

In Upcoming Sessions:

Tentatively, the upcoming sessions will focus on teaching core ML concepts, building simple ML models, understanding how to analyse and improve their performance, why deep learning is needed, the current state of ML research and how ML models are deployed.