-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling missing data? #4
Comments
Hi @kmundnic, thanks for the detailed reproducible example, highly appreciate! I don't think we stated that the method works with missing data, but with incomplete samples (that is, a subsample from a larger population). Indeed, the fit_mle function cannot currently estimate a correlation matrix with missing cells. However, you can fit the marginals with missing data (as you did) then then drop rows that are not full: data = convert(Matrix, df[completecases(df), :])
G = fit_mle(Copula.GaussianCopula, marginals, data) Since we only look at pairwise correlations, we could adapt the optimisation to run with Missing data easily. Rows with at least two non-missing cells will always provide more information than if they're dropped. Happy to help you with that if you want to contribute? |
Thanks for your quick reply @cynddl. The missing data/incomplete data misunderstanding is clear now :-) (I also re-read those portions of the paper and it makes sense). My data has missing values for many subjects, so I can't afford removing the rows with missing entries, and my sample is "small" (slightly over 200 subjects). Therefore, I need to adapt the code. When you say:
do you mean that it can, but producing a biased estimation? If I understand correctly, you're using Mutual Information (MI) to estimate the pairwise correlations, so I would need to adapt the estimation of the MI to have a (hopefully) unbiased estimator of \Sigma. However, in my MWE When this is solved, my understanding is that Thanks for your help! |
Sorry for the late reply. At the moment, the MI matrix is computed here: A simple trick would be, when iterating over couples of columns The plan would be:
|
I'm trying to follow the examples with my data which is incomplete, but the function
uniqueness
doesn't handleUnion{Int, Missing}
. According to your paper, your method is able to handle missing data, so I'm wondering if this was implemented?Here's a minimal working example of the code throwing an error:
which throws the following error:
I've installed the latest version using
] add CorrectMatch
.Thanks for your help!
The text was updated successfully, but these errors were encountered: