Replies: 4 comments 10 replies
-
I think that's a great idea. I don't know too much about I want to clarify something, though. When you say you use these methods with If you want to see significant speedups along the lines of option (A), it would make the most sense to do the binning/sorting/etc. that the fast boosting algorithms do upfront, that is, before fitting any trees. Option (A) will repeat this processing every time a tree is fit. Implementing that is basically a matter of reverse-engineering lightgbm, etc. and incorporating those tricks into the core ngboost code in such a way that keeps the modularity with respect to base learners (i.e. the optimizations are ignored if the user choses not to use trees). Alternatively, one might fork the ngboost codebase into something like a fast-ngboost project and rewrite it entirely in C++ (e.g.) with histogram trees baked in as the only option for base learners. The latter is a lot more work but even the former would require a reasonable amount of work with the internals of the current code. |
Beta Was this translation helpful? Give feedback.
-
Either or both would be good :)
…On Wed, Jun 16, 2021, 8:15 AM kmedved ***@***.***> wrote:
Do you mean a PR suggesting the change for the documentation, or actually
changing the default base learner to be HistGradientBoostingRegressor?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#267 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABXBAJ7XM3CEKUIDMTJXKA3TTC5XNANCNFSM46VWPCOA>
.
|
Beta Was this translation helpful? Give feedback.
-
Have there been any updates on this? I have been trying to use NGBoost with a 5GB dataset but I have had no success. It results in timeout within colab pro and I have been hesitant to run this locally as it will hijack my pc for (seemingly) a couple of days. |
Beta Was this translation helpful? Give feedback.
-
I don't have a clear update here. Using |
Beta Was this translation helpful? Give feedback.
-
One of the best parts about ngboost is that it exposes the base learner to the user. It has occurred to me that this makes material speedups available in training if you replace the current Sklearn
DecisionTreeRegressor
as the default base learner with something faster. The issue is essentially that DecsionTreeRegressor is written in python, so it is slow. By using a more optimized base learner, such as one written in C (with a python wrapper), or by using numba, it should be possible to achieve significant speedups in training speeds without sacrificing functionality.In that vein, I have been experimenting with using
LightGBM
,Catboost
, andHistGradientBoostingRegressor
, all with an_estimators
setting of 1, to achieve this. The first two are written in C, whileHistGradientBoostingRegressor
is written in numba (thus offering C-like speeds, but in pure python). The result, based on my early testing are as expected, offering speedups of 2.5x (Catboost) to 5x (LightGBM) relative toDecsionTreeRegressor
on a 10K row, 20 feature-dataset. There may also be performance benefits, but I'm less confident here since I haven't done much tuning in this testing.HistGradientBoostingRegressor
in particular seems appealing in this respect, since it's included in scikit-learn already, although each of the three above offer some benefits, such a missing data handling, categorical variable support, multiprocessing, GPU support (probably not useful here), more loss functions, and a wider range of tunable hyperparameters.Catboost
, while the slowest of the 3, is also interesting since it has very good/dynamic default hyperparameters, thus potentially offering Ngboost strong 'out of the box' performance.I am interested in whether there are any concerns with this approach (theoretical or practical) that I'm missing (particularly with respect to the histogram-nature of these boosting algorithms). Potentially it would make sense to add
HistGradientBoostingRegressor
in particular to the documentation as a way to speed up training for users if this makes sense. Obviously users are free to arrive at this decision themselves, but it's somewhat counterintuitive to use a boosting library without any boosting, so if this approach has merit, a suggestion may be helpful. This would help address perhaps the most common question I see re: ngboost around speed-of-training.Interested in thoughts/concerns.
Beta Was this translation helpful? Give feedback.
All reactions