-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrepancy between including factors as covariates directly vs hard-coding dummy variables #144
Comments
I'm not going to debug your wall of code. |
I agree with Greg, a self-contained example would be helpful. How can you reproduce this issue using included But if I had to take a guess based on what wrote, this may relate to how shrinkage is done in the logistic regression (see ?arm::bayesglm and ?MAST::defaultPrior for some details). I expect that setting Yes, factors can be entered into the formula directly, the formula uses model.matrix to generate the design matrix like most of R. |
Apologies! I guess my question was more high-level rather than asking to debug. My presupposition is that it should not matter what reference level you choose for a factor to be adjusted. Indeed, that's what seems to be the case if I hard-code the one-hot encodings using model.matrix. On the other hand, if I just use the factor as is in the zlm formula, I get different results depending on what is the reference level. I'll try to highlight the relevant code below: In the following, the covariate that I would like to adjust is "tissue" that I feed into the zlm directly as a factor below.
The resulting output changes based on how I set the reference level:
I get the following output:
Mainly I see differences in the coef ranging from 0.15 to 0.46.
I get the same answer no matter how I choose the reference level.
The idea that variation might occur with bayesglm sounds promising, but I was able to achieve invariance to factor encoding when using model.matrix. I guess my question really is, is that expected? Should I not be feeding factors directly in to the zlm formula for bayesglm, but it should be ok for glm? Thanks so much! |
I can't speculate what the "one hot coding" you implemented actually doing without code that I can run myself, ie, Manually generating a design matrix doesn't do anything that you can't do with a factor for glm or bayesglm. |
@jmchan88 do you have a |
Dear MAST team,
I encountered unexpected behavior when I was including a covariate (tissue) to adjust in my formula.
I noticed that changing the reference level of tissue changes the results!
However, when I instead hard-code the dummy variables for tissue, I not only get different results, but these results don't change based on which reference level is considered. For example:
Does zlm in MAST allow for factors to be entered in the formula directly? My understanding is that when adjusting for a categorical variable, the reference level chosen should not change results, right? Is hard-coding the dummy variables of the factor the right way to do this? Any help/ideas would be greatly appreciated! Thank you!
Joe
The text was updated successfully, but these errors were encountered: