Skip to content

my personal repo for learning machine learning

Notifications You must be signed in to change notification settings

Satyam-10124/machine_learning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Study Guide

Welcome to the Machine Learning Study Guide! This document provides a comprehensive overview of essential machine learning concepts, methods, and practices. It is designed to offer a solid foundation for both learning and practical application in machine learning. This will help you learn Machine Learning

Table of Contents

  1. Types of Learning
  2. Data Splitting
  3. Descriptive Statistics
  4. Outliers
  5. Feature Scaling
  6. Feature Selection
  7. Correlation vs. Causation
  8. Normalization and Transformation
  9. Regression and Correlation
  10. References

Types of Learning

Supervised Learning

Supervised learning utilizes labeled data to train models for tasks such as classification and regression.

  • Example: Predicting house prices using historical data with known prices.

Unsupervised Learning

Unsupervised learning involves unlabeled data to identify patterns or groupings, including clustering and dimensionality reduction.

  • Example: Segmenting customers based on purchasing behavior.

Semi-Supervised Learning

Semi-supervised learning combines labeled and unlabeled data to enhance model performance.

  • Example: Training a model on a small set of labeled images and a large set of unlabeled images.

Reinforcement Learning

Reinforcement learning trains an agent to make decisions based on rewards or penalties in an environment.

  • Example: Teaching an agent to play a game by maximizing its score.

Data Splitting

Data is typically divided into:

  • Training Set: For model training.
  • Validation Set: For tuning model parameters and selecting the best model.
  • Test Set: For evaluating model performance on unseen data.

Descriptive Statistics

Descriptive statistics provide insights into the main features of a dataset:

  • Mean (Arithmetic Mean): Mean Formula where ( n ) is the number of data points, and ( x_i ) are the values.

  • Median: The middle value in an ordered dataset, not influenced by outliers.

  • Mode: The most frequently occurring value.

  • Min and Max: The smallest and largest values, respectively.

  • Variance: Variance Formula

  • Standard Deviation: Standard Deviation Formula


Outliers

Outliers are data points significantly different from others. Their treatment depends on context:

  • Include Outliers: If they provide valuable information.
  • Exclude Outliers: If they are errors or distort results.

Methods to Identify Outliers

  • Visual Inspection: Use plots like box plots or scatter plots.

  • Statistical Methods:

    • Z-score: Indicates how many standard deviations a point is from the mean. Z-score Formula where ( x ) is the data point, ( \mu ) is the mean, and ( \sigma ) is the standard deviation.

    • IQR (Interquartile Range): IQR Formula Outliers are points outside the range ([Q1 - 1.5 \times IQR, Q3 + 1.5 \times IQR]), where ( Q1 ) and ( Q3 ) are the first and third quartiles.


Feature Scaling

Feature scaling ensures that features contribute equally to the model. Common methods include:

  • Standard Scaling (Standardization): Standard Scaling Formula where ( \mu ) is the mean and ( \sigma ) is the standard deviation.

  • Min-Max Scaling: Min-Max Scaling Formula Scales features to a range, typically [0, 1].

  • Robust Scaling: Robust Scaling Formula Uses median and interquartile range (IQR) to handle outliers.


Feature Selection

Feature selection improves model performance by choosing relevant features:

  • Variance-based: Remove features with low variance.
  • Correlation-based: Remove features that are highly correlated to avoid redundancy.

Correlation vs. Causation

  • Correlation: Measures the relationship between two variables. A positive correlation means both variables increase together, while a negative correlation means one increases as the other decreases.

    Correlation coefficient formula: Correlation Coefficient Formula where ( Cov(X, Y) ) is the covariance between ( X ) and ( Y ), and ( \sigma_X ) and ( \sigma_Y ) are their standard deviations.

  • Causation: Indicates that one variable directly affects another. Correlation does not imply causation.


Normalization and Transformation

  • Normalization: Adjusts data to a specific range, such as [0, 1] or [-1, 1].

  • Power Transformation: Helps stabilize variance and make the data more Gaussian-like, improving modeling performance.


Regression and Correlation

  • Regression: Predicts one variable based on another, quantifying the relationship between them.

  • Correlation: Measures the strength and direction of the linear relationship between variables but does not imply causation.


References

For further reading and additional resources, please refer to:


This guide provides a solid foundation for understanding key machine learning concepts. Explore these topics further and apply them to your own projects and problems!

entropy

[email protected]

About

my personal repo for learning machine learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%