Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponentially faster tree depth #38

Merged
merged 5 commits into from
Mar 13, 2024

Conversation

mayer79
Copy link
Contributor

@mayer79 mayer79 commented Mar 13, 2024

This PR brings an exponential speed-up in calculating the min depth distribution. The speed gain is especially strong for very deep trees, as with random forests fitted on larger data.

The trick is to loop over tree depth instead of looping over tree nodes.

As such, it solves #34 and adds to PR #35 which mainly (and unfortunately) brought a speed-up only for small trees.

library(randomForest)
library(randomForestExplainer)
library(ranger)
library(ggplot2)

set.seed(12)

# Random forest
fit <- randomForest(price~carat+color+cut+clarity, data = diamonds, ntree = 100)
system.time(  # 24s -> 0.6s
  out <- min_depth_distribution(fit)
)
head(out)
#   tree variable minimal_depth
# 1    1    carat             2
# 2    1  clarity             0
# 3    1    color             2
# 4    1      cut             3
# 5    2    carat             2
# 6    2  clarity             3


# Ranger (seems to grow much deeper trees)
fit2 <- ranger(
  price~carat+color+cut+clarity, data = diamonds,
  num.trees = 100,
  max.depth = 10, # without this, the original depth calculation won't stop
  seed = 1
)
system.time(  # 19s -> 0.1s
  out <- min_depth_distribution(fit2)
)
head(out)

#   tree variable minimal_depth
# 1    1    carat             1
# 2    1  clarity             0
# 3    1    color             2
# 4    1      cut             2
# 5    2    carat             2

@mayer79 mayer79 requested a review from pbiecek March 13, 2024 17:45
@mayer79 mayer79 self-assigned this Mar 13, 2024
@mayer79 mayer79 marked this pull request as draft March 13, 2024 18:47
@mayer79 mayer79 marked this pull request as ready for review March 13, 2024 18:55
@mayer79 mayer79 marked this pull request as draft March 13, 2024 18:56
@mayer79 mayer79 marked this pull request as ready for review March 13, 2024 19:08
@pbiecek
Copy link
Member

pbiecek commented Mar 13, 2024

binary search wins!
thanks

@pbiecek pbiecek merged commit 96572e0 into ModelOriented:master Mar 13, 2024
5 checks passed
@mayer79 mayer79 deleted the tree-depth-faster-2 branch March 14, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants