Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minDistinctLabels to decision tree to prevent UQ collapse in Bagger #197

Open
maxhutch opened this issue Dec 4, 2019 · 1 comment
Open

Comments

@maxhutch
Copy link
Contributor

maxhutch commented Dec 4, 2019

If the training labels have repeats of label values, then it is increasingly possible that every tree in the ensemble makes the same prediction (even if the input values are different). This could be prevented by imposing a minimum number of distinct label values in the leaves of the decision trees. That would significantly increase the likelihood that different trees had different pairs of label values in the leaf that hits a prediction, and therefore make different predictions, and therefore has some predictive uncertainty.

cc: @bfolie

@maxhutch
Copy link
Contributor Author

maxhutch commented Dec 5, 2019

An alternative: simply set a predicted uncertainty floor that depends on the variance of the training labels and the number of training rows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant