Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splittable random numbers for reproducible training #259

Open
bfolie opened this issue Dec 17, 2021 · 3 comments
Open

Splittable random numbers for reproducible training #259

bfolie opened this issue Dec 17, 2021 · 3 comments

Comments

@bfolie
Copy link

bfolie commented Dec 17, 2021

Bagger and MultiTaskBagger both train the individual models in parallel. Because the order of training is uncontrolled, this means that Lolo random forests are inherently non-reproducible, even if the bagging and the rngs for base learners are identical.

There are ways of guaranteeing reproducibility across multiple threads, and we should make use of them.
SplittableRandom in Java
A discussion in the context of numpy

@iterateccvoelker
Copy link

Hi, how is it going? Is there any update on the issue? Thank you so much for a brief message in advance! Best, Christoph

@bfolie
Copy link
Author

bfolie commented Sep 16, 2022

Thanks for asking @BAMcvoelker . To be honest we hadn't thought about it in a while, but after seeing your comment we realized we have all of the tools and just need to thread them through.

We open sourced our splittable random number library, which means it's available to pull into Lolo. I will pull it in soon and use it to make bagged training reproducible.

@iterateccvoelker
Copy link

Thank you so much @bfolie for the update and for picking up the topic again. I look forward to the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants