Replies: 3 comments 1 reply
-
Hmm... maybe I was looking for https://insightsengineering.github.io/rbmi all along... |
Beta Was this translation helpful? Give feedback.
-
Hi @wlandau! Your data looks like long format clustered data. |
Beta Was this translation helpful? Give feedback.
-
Some generic advice, not knowing how well it applies to your case: If you can, convert the data into a wide format before imputation and convert it back into a long format for making plots or for subject-level analyses. Imputing wide-format data (all patient data in one row) with It might be challenging to convert into a wide format if patient timing differs widely. In that case, we end up using hundreds of different time points. You might try to define time bins, but in my mind, a better solution is to use a broken stick model <doi:10.18637/jss.v106.i07> to convert long to wide (repeated measures). The accompanying software also supports the generation of multiple imputations. Every problem is different, so use whatever suits you. |
Beta Was this translation helpful? Give feedback.
-
Background
I work with longitudinal clinical trial data and longitudinal models like the mixed model for repeated measures (MMRM). These multivariate models typically assume independent patients and correlated observations within patients.
There seems to be a consensus among the Clinical Data Interchange Standards Consortium (CDISC), modeling packages like
brms
, and the Tidyverse to represent this longitudinal data in tidy long form.Data
An example dataset is the FEV dataset from the
mmrm
package, where the response variableFEV1
is recorded for multiple patients (USUBJID
) who were randomized to different treatment groups (ARMCD
) and measured over multiple time points (AVISIT
).Above, each independent patient (index variable
USUBJID
) has multiple rows of data and the rows within a patient (index variableAVISIT
) are correlated repeated measures.Model
In
brms
, I can run a multivariate model on this tidy long dataset by adding a unstr(time = AVISIT, gr = USUBJID) term in the model formula.brms
has incredibly smooth native integration withmice
which pools analyses automatically. I would strongly prefer to use this integration for multiple imputation in MNAR scenarios.Problem
However, I am not sure if it would be appropriate to do so.
From chapter 3 and chapter 4 of Flexible Imputation of Missing Data,
mice
seems to have different ideas about how to represent univariate vs multivariate data. According to the book, a univariate missing data problem is when only one column in the data has missing values, and a multivariate one is when multiple columns have missing values. Throughout the book, each row in a dataset is implicitly referred to as a "case", whereas my colleagues and I think of a "case" as a patient with multiple rows. All this implies a non-tidy wide format for the data, rather than the conventional tidy long one we prefer to work with.In my line of work, there is only one partially missing column in the dataset, but the underlying statistical problem is still multivariate. I am wondering what is the most appropriate way to use
mice
in this ubiquitous scenario.Experiments
When I naively plug the full FEV dataset into
mice
, I get warnings about "logged events".From the messages,
mice
was looking at collinearity with the patient ID variableUSUBJID
, which is incorrect for the situation. Dropping it removes the warnings, but then the model is univariate where presumably each row is a "case".Unless I am missing something about how PMM works, it seems like an appropriate imputation method should at least have some awareness of
USUBJID
. To tellmice
that the data is really multivariate, it seems like I need to pivot it to wide form first.But this does not work in my case because it is no longer possible to regress on time-varying predictors. For example, if I pivot
FEV1
to wide form, I need to drop the variableWEIGHT
(or try to convertWEIGHT
to multiple columns as well, which does not really make sense). In addition, I would need to switch to a much more complicated multivariate modeling syntax inbrms
, which may make it difficult or impossible to specify the the precise covariance and correlation structures I need. It is tempting to hack into theimputed
object and pivot each imputed dataset back to long form, but it seems unwise to try.Beta Was this translation helpful? Give feedback.
All reactions