Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to interpret xerror of rpart (method="class") when losses are assigned? #65

Open
A-Pai opened this issue Mar 13, 2024 · 4 comments
Open
Labels
help wanted Extra attention is needed

Comments

@A-Pai
Copy link

A-Pai commented Mar 13, 2024

How to interpret xerror of rpart (method="class") when losses are assigned

image

Strangely initial xerror is extremely small and it is 0.1000 (I see this is 1/loss of class 0), then xerror increases to 2.41, 2.77 and so on. Somehow xerror at the initial node is much much better than rel error!!

@hmeeks0212
Copy link

Have you figured out what to do yet? Because of the loss matrix, the 1-SE rule doesn't apply. I don't how how to pick the trees in this case.

@bethatkinson
Copy link
Owner

bethatkinson commented Oct 24, 2024 via email

@hmeeks0212
Copy link

hmeeks0212 commented Oct 25, 2024

Ah, that's OK. Thank you so much for responding so quickly!

First of all, I would like to thank you for all the extensive work that you've done to maintain rpart. When I asked my previous question, I thought I was communicating with Apai!

I'm not sure if this would be of help to you. But an additional problem that I've noticed is that including the loss matrix would impact not only the xerror, but also the variable of importance. The rpart manual stated that the overall measures of variable importance are scaled to sum to 100. This is not always the case when including the loss matrix.

Additionally, I have been trying to replicate the trees created in rpart and Minitab. I've read the manuals for both rpart and Minitab Predictive Analytics software. My understanding from studying the two manuals is that both rpart and Minitab use the same method. However, Minitab applied additional logic (that I couldn't find online) to calculate xerror and xstd when loss matrix is applied. I thought I would let you know just in case you can find or have access to the underlying logic that Minitab uses.

I hope that would help with any future update for rpart.

Best,
M.

@bethatkinson
Copy link
Owner

bethatkinson commented Oct 25, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants