Skip to content

Latest commit

 

History

History
48 lines (27 loc) · 1.63 KB

README.md

File metadata and controls

48 lines (27 loc) · 1.63 KB

Decision Trees

  • Decision Tree is a supervised machine learning algorithm used for both classification and regression which mimics human decision-making by splitting data into branches based on feature values, leading to a decision.

Key concepts:

1) Structure of a Decision Tree

  • A decision tree consists of:

    • Root Node: The starting point of the tree where the first split occurs;
    • Internal Nodes (Decision Nodes): Nodes where decisions are made based on feature values;
    • Leaf Nodes (Terminal Nodes): Nodes representing the final decision (class label or numerical value);
  • How a decision tree works:

    1- Select the best feature to split the data;

    2- Create a decision node and split the dataset;

    3- Repeat recursively for each subset until:

    • All data points in a node belong to the same class (for classification);
    • A stopping criterion (e.g., max depth, min samples per node) is met.

2) Splitting Criteria in Decision Trees

  • The decision tree selects the best feature to split based on impurity measure:

A) For classification:

  • Gini impurity: measures how often a randomly chosen element would be incorrectly classified:

$ Gini = 1 - \sum p_{i}^{2}$

where $p_i$ is the probability of class $i$;

  • Entropy: measures the uncertainty in the dataset:

$ Entropy = -\sum p_i \log_2(p_i)$

B) For regression:

  • Mean Squared Error (MSE): measures variance within a node:

$MSE = \frac{1}{n} \sum (y_i - \bar{y})^2$

where $\bar{y}$ is the mean target value in a node.

decision-trees