Skip to content

Latest commit

 

History

History
174 lines (137 loc) · 4.95 KB

ai_job_concepts.md

File metadata and controls

174 lines (137 loc) · 4.95 KB

Module 1: Python Programming for Data Science

Sub-module 1.1: Python Basics

  • Data types and variables
  • Control structures (if-else, for loops, while loops)
  • Functions and lambda expressions
  • Exception handling

Sub-module 1.2: Advanced Python

  • List comprehensions
  • Generators and iterators
  • Decorators
  • Context managers

Sub-module 1.3: Python Libraries

  • NumPy (arrays, matrix operations)
  • pandas (dataframes, series, data manipulation)
  • Matplotlib (basic plotting, figures, and axes)
  • seaborn (statistical data visualization)

Module 2: Data Management and Manipulation

Sub-module 2.1: Data Cleaning

  • Handling missing data
  • Data type conversion
  • Normalizing and scaling

Sub-module 2.2: Data Exploration

  • Descriptive statistics
  • Correlation analysis
  • Outlier detection

Sub-module 2.3: Data Wrangling

  • Merging, joining, and concatenating data
  • Grouping and aggregation
  • Pivot tables and cross-tabulation

Module 3: Machine Learning

Sub-module 3.1: Fundamentals of ML

  • Supervised vs. unsupervised learning
  • Overfitting and underfitting
  • Bias Variance Tradeoff
  • Train-test split
  • Cross-validation

Sub-module 3.2: Regression Algorithms

  • Linear regression (Line fitting, Residual, Gradient Descent)
  • Polynomial regression (Polynomial Order)
  • Ridge, Lasso and ElasticNet regression (What effed does each penalty term have on the training data)

Sub-module 3.3: Classification Algorithms

  • Logistic regression (OVA, OVR)
  • Decision Trees (Gini, Entrophy)
  • Random Forests
  • Support Vector Machines (SVM) (Max-margin Classifier, Kernel trick [RBF, Linear, Sigmoid])
  • k-Nearest Neighbors (k-NN)

Sub-module 3.4: Ensemble Methods

  • Bagging
  • Boosting (AdaBoost, GradientBoost, XGBoost, CatBoost, LightGBM)
  • Stacking

Sub-module 3.5: Unsupervised Algorithms

  • k-means clustering
  • Hierarchical clustering
  • Principal Component Analysis (PCA)

Sub-module 3.6: Model Evaluation

  • Confusion matrix
  • ROC-AUC
  • Precision-Recall
  • F1 Score

Sub-module 3.7: Model Finetuning

  • Grid Search
  • Optional
  • Cross-validation

Module 4: Deep Learning

Sub-module 4.1: Neural Networks Basics

  • Perceptrons
  • Activation functions (ReLU, sigmoid, tanh)
  • Feedforward neural networks
  • Backpropagation and gradient descent

Sub-module 4.2: Advanced Neural Networks

  • Convolutional Neural Networks (CNNs)
    • LeNet, AlexNet, VGGNet, InceptionNet, ResNet, EfficientNet
    • Transfer Learning
  • Recurrent Neural Networks (RNNs)
    • Vanishing and Exploding Gradient
  • Long Short-Term Memory networks (LSTMs)
  • Autoencoders (VAE)
  • Generative Adversarial Networks (GANs) (DCGAN, PixGAN, CycleGAN)
  • Transformers

Sub-module 4.3: Frameworks and Tools

  • TensorFlow
  • Keras
  • PyTorch

Sub-module 4.4: Model Optimization and Deployment

  • Regularization techniques
  • Hyperparameter tuning (Grid search, Random search)
  • Model deployment (Flask, Docker)

Module 5: Gen AI

Sub-module 5.1: Transformers)

  • Sequence to Sequence Models
  • Attention Mechanism
  • Self Attention
  • Transformers, Positional Encoding

Sub-module 6.2: LLM Architectures

  • Encoder Only Architecture
    • BERT (Bidirectional Encoder Representations from Transformers)
    • GPT (Generative Pre-trained Transformer) - often categorized as Decoder Only but can be adapted for encoder-only tasks DistilBERT (a distilled version of BERT that is lighter and faster) ALBERT (A Lite BERT for self-supervised learning of language representations)
  • Decoder Only Architecture
    • GPT (Generative Pre-trained Transformer)
  • Encoder-Decoder Architecture
    • Transformer (original model comprising both encoder and decoder)
    • BART (Bidirectional and Auto-Regressive Transformers)
    • T5 (Text-to-Text Transfer Transformer)

Sub-module 6.3: Prompt Engineering

  • Zero-shot, Single Shot, Few Shot
  • RAG
  • OpenAI APIs
  • Using self-hosted LLMs, Huggingface

Module 6: Special Topics

Sub-module 6.1: Natural Language Processing (NLP)

  • Text preprocessing (tokenization, stemming, lemmatization)
  • Word embeddings (Word2Vec, GloVe)
  • Sentiment analysis
  • Named Entity Recognition (NER)

Sub-module 6.2: Computer Vision

  • Image processing basics
  • Object detection
  • Image classification

Sub-module 6.3: Time Series Analysis

  • ARIMA models
  • Seasonal decomposition
  • Forecasting

Courses

  1. https://www.coursera.org/professional-certificates/ibm-data-science
  2. https://www.coursera.org/specializations/machine-learning-introduction
  3. https://www.coursera.org/specializations/deep-learning
  4. https://www.coursera.org/specializations/natural-language-processing
  5. https://www.coursera.org/professional-certificates/tensorflow-in-practice
  6. https://www.coursera.org/specializations/machine-learning-engineering-for-production-mlops
  7. For LLM cover all short courses by dl.ai - https://www.deeplearning.ai/short-courses/