-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
https://www.mm218.dev/posts/2021/01/model-averaging/ #2
Comments
Thank you very much Mike, excellent article. I learned a lot from you. |
Hey @ronandsa ! Glad it's useful. The short answer is "yes". You'd want to weight the probability each component model assigns each class, not just the assigned class, and then take the most likely classification. I personally haven't come across the massive sort of review for classifiers as Dormann et al. did for regressors, but people most certainly do it: https://web.cs.ucdavis.edu/~davidson/Publications/ModelAve.pdf (Of course, how effective any given averaging method is will depend on the specifics of your data and models!) |
Thank you SO MUCH - this is the most concise explanation on model averaging I have been able to locate. Wonderfully helpful, and I'm more confident in my approach now. |
Glad it was useful @gochezkerrth ! A disclaimer before this answer: I'm an applied researcher, not a statistician by training, so I'm going to answer more with what I have done rather than what's the right thing to do. So, in this situation, it sounds like you're doing exploratory/descriptive modeling work -- you've got a set of X potential IVs, and a number of different plausible models combining them to estimate your DV. That means you're not doing confirmatory/attribution modeling work: your conclusions can be "it seems like X, Y, and Z are important variables here" but not "we're very confident that X has an impact of Y%". For that type of project, I'm a fan of ranking models via AIC (or, well, AICc). You fit all possible models, then rank them by AICc, and then "select" any model whose AICc is within a set value (usually 4, sometimes 2) as a well-supported model. Your results section can then focus on what DVs are generally included in your well-supported models, and on the size and direction of their coefficients. I did this, for instance, in Mahoney and Stella 2020 ( https://www.mm218.dev/papers/mahoney_stella_2020.pdf ) -- see table 4 on page 9, section 3.4 for an example. This approach doesn't really use p-values; a model having an AICc within your threshold means the model is well-supported, regardless of the p-values of its terms. You'd only then average the models (maybe using AICc!) if you were using these models to generate predictions -- generating predictions from each model and then averaging those together. Hope that makes sense! |
I'm an applied researcher as well; I appreciate you underlining the distinction. I am also reaching out to my local university math department to see if there are any theoretical correct methods I'm missing. Correct, this is exploratory and descriptive with some predictive results (mostly predicting who will do really well and why, and who will do really poorly and why, to generate ideas for supports going forward), and I have ranked them by AICc (so glad to be on track!). My relationships are generally Odds Ratios, not coefficients, as most are not linear. Ideally, I would like to report out something like "clients with characteristic X by year 4 were on average 3.2x more likely to achieve model status (the DV) in year 5". And have at least my documentation noting that a relationship is reported out only if consistently significant, etc, & how I defined that. I'm also employing a backwards approach to maximize the significant IV relationships ("Caveats to model selection" from https://uoftcoders.github.io/rcourse/lec09-model-selection.html) before running and selecting my lowest AICc and the delta AICc <=2. I noticed your paper in 2020 used delta <=4, and I was wondering if there's a specific reason? In sum, it sounds like I'm on the right track? Thank you again. |
Sounds like you're on the right track! Believe the 4 threshold is from Burham and Anderson 2004 -- https://doi.org/10.1177/0049124104268644 -- but I can't get a copy of that paper right now. I completely forget why 4 is preferable versus 2. My only other suggestion is that you might consider reporting the range of coefficient variables -- "across all well-supported models, clients with characteristic X were between 1.8x (1.6-2.0) to 3.2x (3.0 - 3.4) more likely to achieve model status." But otherwise, sounds like a good approach to me! |
Aha, bless you - sometimes I forget the simple things. A range works perfectly. Thank you! |
Mike Mahoney: Model averaging methods: how and why to build ensemble models
https://www.mm218.dev/posts/2021/01/model-averaging/
The text was updated successfully, but these errors were encountered: