Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using projpred with brm_multiple and mice #275

Open
HenrikEckermann opened this issue Feb 14, 2022 · 4 comments
Open

Using projpred with brm_multiple and mice #275

HenrikEckermann opened this issue Feb 14, 2022 · 4 comments
Labels
enhancement Enhancements of existing features, but also new feature requests.

Comments

@HenrikEckermann
Copy link

HenrikEckermann commented Feb 14, 2022

Hi,

I tried to use projpred after multiple imputation using the mice package. It did not produce results as expected (see here for plots of the output. I suspected that this was due to a combination of lack of predictive power and because of multiple imputation outside of the statistical model with mice may not be supported. @avehtari confirmed to me that multiple imputation might not yet be supported and to open an issue here.

In case it is in fact supported already: Below code reproduces the output. I fit random noise with and without using multiple imputation. I would expect that RMSE is ~equal between models thanks to the horseshoe prior. This is the case without imputations but when using multiple imputations RMSE increases when more variables are included.

One more thing: Note that models have divergent transitions. In my real dataset I circumvented that by using different priors that are less regularizing (normal(0, 0.2) for all betas). I am aware that the horseshoe prior is important to prevent overfitting but I did not find a way yet to fit the models without divergent transitions when using it in my dataset (any hints are welcome if there are better ways to circumvent that).

  library(tidyverse)
  library(brms)
  library(projpred)
  library(mice)


  # simulate data where all 25 beta are 0
  set.seed(1)
  n <- 75
  n_feat <- 25
  X <- map_dfc(1:n_feat, function(x) {
    df <- tibble(rnorm(n, 0, 2))
    colnames(df) <- glue::glue("var_{x}")
    df
  })
  y <- rnorm(n, 0, 1)
  df <- bind_cols(tibble(y = y), X)

  # projpred 
  n <- nrow(df) # 75
  D <- ncol(df[, -1]) # 25
  p0 <- 3 # prior guess for the number of relevant variables
  tau0 <- p0/(D-p0) * 1/sqrt(n) 
  m1 <- brm(
    y ~ ., 
    family = gaussian(), 
    data = df,
    prior = prior(horseshoe(scale_global = tau0, scale_slab = 1), class = b),
    seed = 1, 
    chains = 2, 
    iter = 2000,
    # control = list(adapt_delta = 0.8)
  )
  refmodel <- get_refmodel(m1)
  vs <- cv_varsel(refmodel, method = "forward", cv_method = "LOO")
  plot(vs, stats = c("elpd", "rmse"))



  # create ~ 30% random missingness per var
  for (row in seq_along(df$var_1)) {
    for (col in seq_along(colnames(df))) {
      # ~ 30% missingness
      if (runif(0, 1, n = 1) > 0.7) {
        df[row, col] <- NA
      }
    }
  }

  # 10 imputations (predictive mean matching)
  dfimp <- mice(df, m = 10)

  # use multple imputed datasets with projpred 
  m2 <- brm_multiple(
    y ~ ., 
    family = gaussian(), 
    data = dfimp,
    prior = prior(horseshoe(scale_global = tau0, scale_slab = 1), class = b),
    seed = 1, 
    chains = 2, 
    iter = 2000,
    # control = list(adapt_delta = 0.8)
  )

  refmodel2 <- get_refmodel(m2)
  vs2 <- cv_varsel(refmodel2, method = "forward", cv_method = "LOO")
  plot(vs2, stats = c("elpd", "rmse"))

}
@avehtari
Copy link
Collaborator

Thanks for creating the issue

@fweber144
Copy link
Collaborator

Yes, thanks for the issue. Output from brm_multiple() is indeed not supported by projpred and brms, at least when using the output from brm_multiple() directly as input to get_refmodel(). So this issue is about raising an appropriate message/warning/error. When running line refmodel2 <- get_refmodel(m2) from your example, I already get

Warning message:
Using only the first imputed data set. Please interpret the results with caution until a more principled approach has been implemented.

For now, I would consider this as sufficient. I don't know if you got that warning, too. Might depend on the brms version you are using. I am using brms's current GitHub version (commit 8ba9e53). A Git Blame for the relevant lines in brms (https://github.com/paul-buerkner/brms/blob/8ba9e5333729e9d27e72a6fc81d746b4bb0d6251/R/brm_multiple.R#L231-L239 and https://github.com/paul-buerkner/brms/blob/8ba9e5333729e9d27e72a6fc81d746b4bb0d6251/R/prepare_predictions.R#L22) shows that brms should have thrown such a warning for about 3 years, but there might have been other lines I am currently not aware of which inhibited this or caused prepare_predictions.brmsfit() not to be called in get_refmodel.brmsfit(). Anyway, the important point is that the upcoming CRAN version of brms should throw this warning.

Thanks again and I'll think about documenting the missing support for brm_multiple().

@fweber144
Copy link
Collaborator

Sorry, I should have left this open as a feature request! Re-opening now and adding the correct label.

@fweber144 fweber144 reopened this Jan 5, 2023
@fweber144 fweber144 added the enhancement Enhancements of existing features, but also new feature requests. label Jan 5, 2023
@fweber144
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancements of existing features, but also new feature requests.
Projects
None yet
Development

No branches or pull requests

3 participants