Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour when incompatible full formulas included in search_terms argument #362

Open
sor16 opened this issue Oct 25, 2022 · 2 comments
Labels
enhancement Enhancements of existing features, but also new feature requests.

Comments

@sor16
Copy link

sor16 commented Oct 25, 2022

Hi,

I noticed that when full formulas are used in search_terms input argument such that some formulas are incompatible, cv_varsel doesn't check the incompatibility and includes variables from two incompatible formulas. This can be seen in the following reprex

library(brms)
library(projpred)
set.seed(2)
N <- 100
p <- 5 #number of parameters
dat <- as.data.frame(matrix(rnorm(N*p), nrow = N, ncol = p)) #initialize data frame with p covariates
names(dat) <- paste0('x', 1:p)

betas <- rnorm(p) # simulate effect values
dat$y <- rnorm(N, mean=as.matrix(dat[, paste0('x', 1:p)]) %*% betas) #y is a noisy observation of the linear combination of covariates

formula_all <- as.formula(paste0('y~', paste(paste0('x', 1:p), collapse = '+')))
ref_mod <- brm(formula_all, data = dat, refresh = 0)
cv_select_prj <- cv_varsel(ref_mod, method = 'forward', cv_method = 'LOO', refit_prj = F, search_terms = c('x1+x2','x1+x3','x1+x2+x4'))

Here, I would assume that either x1, x2, x4 would be included or x1 and x3 only, as x1+x2+x4 is not compatible with x1+x3. However, cv_varsel returns:

Selection Summary:
 size solution_terms elpd.loo  se  diff diff.se
    0           <NA>   -181.9 6.0 -36.8     7.0
    1        x1 + x3   -161.3 6.0 -16.2     5.3
    2             x2   -149.1 6.1  -4.1     3.1
    3             x4   -147.4 5.8  -2.3     2.5

Is this the intended behaviour or should we try to resolve this? I The problem seems to lie in select_possible_terms_size in formula.R.

@fweber144
Copy link
Collaborator

Thanks, I'll take a look at this.

@fweber144
Copy link
Collaborator

fweber144 commented Oct 26, 2022

I think this behavior is correct given projpred's current implementation of the search_terms argument: After x1 + x3 was chosen, x2 from x1 + x2 is a candidate because the required x1 term is already included in the chosen terms. And after x2 was chosen additionally, x4 from x1 + x2 + x4 is a candidate because the required x1 and x2 terms are already included in the chosen terms.

Despite this correctness, I think I know what you intended: At size 3 (including the intercept), x1 + x2 and x1 + x3 should be candidates (thereby forcing the inclusion of x1). At size 4, x1 + x2 + x4 should be a candidate only if x1 + x2 has been chosen at size 3. I don't think this is possible with the current implementation of the search_terms argument. I tried

cv_select_prj <- cv_varsel(ref_mod, method = 'forward', cv_method = 'LOO', refit_prj = F, search_terms = c('x1+x2','x1+x3','x1+x2+x4-x3'))

and

cv_select_prj <- cv_varsel(ref_mod, method = 'forward', cv_method = 'LOO', refit_prj = F, search_terms = c('x1+x2-x3','x1+x3','x1+x2+x4-x3'))

but both don't give the desired results. So I'll label this as a feature request (currently, I think the - syntax is the way to go).

BTW (just for the record, because I first thought like you that this was a bug): Before merging #360, we had the same behavior (but in a slightly different manner): There, within search_forward(), we got

cands
# [1] "x2 + x3"

at size 4 (including the intercept), i.e., after x1 + x3 has been chosen. This was also correct from projpred's point of view—in the same sense as now that #360 has been merged. So #360 (more precisely, the efficiency improvement mentioned here) doesn't seem to have affected this.

@fweber144 fweber144 added the enhancement Enhancements of existing features, but also new feature requests. label Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancements of existing features, but also new feature requests.
Projects
None yet
Development

No branches or pull requests

2 participants