Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up fitting process when including random effect with lme4? #130

Open
hoholee opened this issue Apr 8, 2020 · 5 comments
Open

Speed up fitting process when including random effect with lme4? #130

hoholee opened this issue Apr 8, 2020 · 5 comments

Comments

@hoholee
Copy link

hoholee commented Apr 8, 2020

Hi there!

I'm trying to fit a model with a random effect term following your tutorial on Bioconductor and this issue: #107

library(lme4)
zlmCond <- zlm(~disease + cn_genes_on + cn_nCount_SCT + cn_percent_mt + sex + (1|subject),
               sca, 
               method = 'glmer', 
               ebayes = FALSE)

My sca object contains ~5k genes and ~3k cells. But this fitting process is taking forever to finish, even with options(mc.cores = 16).

There are some discussions over the internet on this and the following seems to be one of the solutions:
https://cran.r-project.org/web/packages/lme4/vignettes/lmerperf.html

I was wondering how should I set these parameters in the zlm function. I've tried fitArgsC and fitArgsD like this but got errors:

zlmCond <- zlm(~disease + cn_genes_on + cn_nCount_SCT + cn_percent_mt + cn_percent_mt + sex + (1|subject),
               sca, 
               method = 'glmer', 
               ebayes = FALSE,
               fitArgsC = list(control = glmerControl(calc.derivs = FALSE)),
               fitArgsD = list(control = glmerControl(calc.derivs = FALSE))
               )
Warning in (function (formula, data = NULL, REML = TRUE, control = lmerControl(),  :
  passing control as list is deprecated: please use lmerControl() instead
Error in (function (optimizer = "nloptwrap", restart_edge = TRUE, boundary.tol = 1e-05,  : 
  unused arguments (tolPwrss = 1e-07, compDev = TRUE, nAGQ0initStep = TRUE, checkControl = list("ignore", "stop", "ignore", "stop", "stop", "message+drop.cols", "warning", "stop", "stop"), checkConv = list(list("warning", 0.001, NULL), list("message", 1e-04), list("warning", 1e-06)))

Any suggestions on this? Thanks in advance!

Junhao.

@amcdavid
Copy link
Member

amcdavid commented Apr 9, 2020

  1. For the fitArgsC, I think you need to be passing a call to lmerControl not glmerControl, at least that seems to be what the error is telling you.
  2. You might also set nAGQ=0 in the fitArgsD, I often get very similar results when I have set this, and it can greatly speed things, in fact I would try that first before messing with calc.derivs.
  3. If you run into problems, make sure you the command line you are using (being passed with fitArgsC etc) works when you manually call lmer/glmer "by hand" on a chunk of the data.
  4. How many subjects? I haven't had great luck unless it's greater than 6 or 7.

@hoholee
Copy link
Author

hoholee commented Apr 9, 2020

Thanks! I'll try and report back later.

As for your 4th comment, I have 3 conditions, 6 subjects each. This model includes all three conditions so 18 subjects in total. In the end, I want to perform the DE test pairwisely between the 3 disease conditions. I guess I can do this with lrTest and Hypothesis. Do you think it's easier (and maybe faster) to just subset the data to just include cells from the two conditions in consideration and make 3 different models? I'm not sure what are the differences between the two approaches.

@hoholee
Copy link
Author

hoholee commented Apr 9, 2020

OK it ran pretty fast when I set nAGQ=0:

tic()
zlmCond_test <- zlm(~disease + cn_genes_on + cn_nCount_SCT + cn_percent_mt + sex + (1|subject),
                sca, 
                method = 'glmer', 
                ebayes = FALSE,
                fitArgsD = list(nAGQ = 0)
                )
Done!
Warning message:
In rbind(...) :
  number of columns of result is not a multiple of vector length (arg 5205)
toc()
73.739 sec elapsed

But all the p-values turn out to be NA?

res <- summary(zlmCond_test, doLRT = 'diseaseControl')
summaryDt <- res$datatable
fcHurdle <- merge(summaryDt[contrast=='diseaseControl' & component=='H',.(primerid, `Pr(>Chisq)`)], #hurdle P values
summaryDt[contrast=='diseaseControl' & component=='logFC', .(primerid, coef, ci.hi, ci.lo)], by='primerid') #logFC coefficients
fcHurdle[,fdr:=p.adjust(`Pr(>Chisq)`, 'fdr')]
fcHurdle
primerid Pr(>Chisq)          coef       ci.hi        ci.lo fdr
   1:     AAK1         NA -0.0292236628 0.037794270 -0.096241596  NA
   2:    AAMDC         NA  0.0142594428 0.043659344 -0.015140458  NA
   3:    AASDH         NA -0.0141089370 0.020589697 -0.048807571  NA
   4: AASDHPPT         NA  0.0004993562 0.049432784 -0.048434072  NA
   5:     AASS         NA  0.0056627210 0.084370582 -0.073045141  NA
  ---                                                               
5201:   ZSWIM6         NA  0.1900799932 0.330113927  0.050046059  NA
5202:   ZSWIM8         NA -0.0303135106 0.001770501 -0.062397522  NA
5203:     ZXDC         NA -0.0022684545 0.054137967 -0.058674876  NA
5204:   ZYG11B         NA  0.0090092703 0.043784030 -0.025765490  NA
5205:    ZZEF1         NA  0.0454279335 0.086347874  0.004507993  NA

@amcdavid
Copy link
Member

amcdavid commented Apr 9, 2020

Not sure. If you provide a reproducible example and output from BiocManager::valid() I can take a look.

@gfinak
Copy link
Member

gfinak commented Nov 19, 2020

I believe this is related to #143 and was fixed recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants