Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster DecisionTreeRegressor (numba acceleration) #361

Open
kmedved opened this issue Dec 13, 2024 · 2 comments
Open

Faster DecisionTreeRegressor (numba acceleration) #361

kmedved opened this issue Dec 13, 2024 · 2 comments

Comments

@kmedved
Copy link
Contributor

kmedved commented Dec 13, 2024

Hi - I've been working on creating a faster version of DecisionTreeRegressor for use with ngboost, specifically through Numba acceleration. I've got an early version of this here.

The concept is that the relative slowness of sklearn's DecisionTreeRegressor as the base learner is one of the main reasons why ngboost is slower than other boosting libraries. From some early testing, the numba compiled version linked above is between 8x-20x faster when used as the base learner for ngboost (but I am still verifying robustness across data size). When ready, I'll put together a colab notebook for benchmarking purposes.

I'm interested in thoughts on whether this makes sense generally, and if so, whether it would make sense to add into ngboost directly, or whether it makes more sense as a standalone library. Curious about thoughts generally, but especially @alejandroschuler and @ryan-wolbeck on this point?

@alejandroschuler
Copy link
Collaborator

Would be a welcome improvement to ngboost but agree that the tree library should be it's own thing that we'd then import as the default learner.

There are many other things we could do better too in the context of tree learners, like pre-sort the data based on outcome to make the tree searches more efficient. All the tricks in lightgbm etc. But then we're getting very specific about the learner and optimizing the library around that. The current design goes the other way and makes the package agnostic and more maintainable at the cost of losing these efficiencies.

@alejandroschuler
Copy link
Collaborator

A fast tree-specific reimplementation of the whole package is probably a good idea but not something I'll ever get around to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants