Speed up fitting process when including random effect with `lme4`? #130

hoholee · 2020-04-08T23:47:42Z

Hi there!

I'm trying to fit a model with a random effect term following your tutorial on Bioconductor and this issue: #107

library(lme4)
zlmCond <- zlm(~disease + cn_genes_on + cn_nCount_SCT + cn_percent_mt + sex + (1|subject),
               sca, 
               method = 'glmer', 
               ebayes = FALSE)

My sca object contains ~5k genes and ~3k cells. But this fitting process is taking forever to finish, even with options(mc.cores = 16).

There are some discussions over the internet on this and the following seems to be one of the solutions:
https://cran.r-project.org/web/packages/lme4/vignettes/lmerperf.html

I was wondering how should I set these parameters in the zlm function. I've tried fitArgsC and fitArgsD like this but got errors:

zlmCond <- zlm(~disease + cn_genes_on + cn_nCount_SCT + cn_percent_mt + cn_percent_mt + sex + (1|subject),
               sca, 
               method = 'glmer', 
               ebayes = FALSE,
               fitArgsC = list(control = glmerControl(calc.derivs = FALSE)),
               fitArgsD = list(control = glmerControl(calc.derivs = FALSE))
               )

Warning in (function (formula, data = NULL, REML = TRUE, control = lmerControl(),  :
  passing control as list is deprecated: please use lmerControl() instead
Error in (function (optimizer = "nloptwrap", restart_edge = TRUE, boundary.tol = 1e-05,  : 
  unused arguments (tolPwrss = 1e-07, compDev = TRUE, nAGQ0initStep = TRUE, checkControl = list("ignore", "stop", "ignore", "stop", "stop", "message+drop.cols", "warning", "stop", "stop"), checkConv = list(list("warning", 0.001, NULL), list("message", 1e-04), list("warning", 1e-06)))

Any suggestions on this? Thanks in advance!

Junhao.

The text was updated successfully, but these errors were encountered:

amcdavid · 2020-04-09T18:16:42Z

For the fitArgsC, I think you need to be passing a call to lmerControl not glmerControl, at least that seems to be what the error is telling you.
You might also set nAGQ=0 in the fitArgsD, I often get very similar results when I have set this, and it can greatly speed things, in fact I would try that first before messing with calc.derivs.
If you run into problems, make sure you the command line you are using (being passed with fitArgsC etc) works when you manually call lmer/glmer "by hand" on a chunk of the data.
How many subjects? I haven't had great luck unless it's greater than 6 or 7.

hoholee · 2020-04-09T18:34:09Z

Thanks! I'll try and report back later.

As for your 4th comment, I have 3 conditions, 6 subjects each. This model includes all three conditions so 18 subjects in total. In the end, I want to perform the DE test pairwisely between the 3 disease conditions. I guess I can do this with lrTest and Hypothesis. Do you think it's easier (and maybe faster) to just subset the data to just include cells from the two conditions in consideration and make 3 different models? I'm not sure what are the differences between the two approaches.

hoholee · 2020-04-09T19:26:41Z

OK it ran pretty fast when I set nAGQ=0:

tic()
zlmCond_test <- zlm(~disease + cn_genes_on + cn_nCount_SCT + cn_percent_mt + sex + (1|subject),
                sca, 
                method = 'glmer', 
                ebayes = FALSE,
                fitArgsD = list(nAGQ = 0)
                )

Done!
Warning message:
In rbind(...) :
  number of columns of result is not a multiple of vector length (arg 5205)

toc()

73.739 sec elapsed

But all the p-values turn out to be NA?

res <- summary(zlmCond_test, doLRT = 'diseaseControl')
summaryDt <- res$datatable
fcHurdle <- merge(summaryDt[contrast=='diseaseControl' & component=='H',.(primerid, `Pr(>Chisq)`)], #hurdle P values
summaryDt[contrast=='diseaseControl' & component=='logFC', .(primerid, coef, ci.hi, ci.lo)], by='primerid') #logFC coefficients
fcHurdle[,fdr:=p.adjust(`Pr(>Chisq)`, 'fdr')]

fcHurdle

primerid Pr(>Chisq)          coef       ci.hi        ci.lo fdr
   1:     AAK1         NA -0.0292236628 0.037794270 -0.096241596  NA
   2:    AAMDC         NA  0.0142594428 0.043659344 -0.015140458  NA
   3:    AASDH         NA -0.0141089370 0.020589697 -0.048807571  NA
   4: AASDHPPT         NA  0.0004993562 0.049432784 -0.048434072  NA
   5:     AASS         NA  0.0056627210 0.084370582 -0.073045141  NA
  ---                                                               
5201:   ZSWIM6         NA  0.1900799932 0.330113927  0.050046059  NA
5202:   ZSWIM8         NA -0.0303135106 0.001770501 -0.062397522  NA
5203:     ZXDC         NA -0.0022684545 0.054137967 -0.058674876  NA
5204:   ZYG11B         NA  0.0090092703 0.043784030 -0.025765490  NA
5205:    ZZEF1         NA  0.0454279335 0.086347874  0.004507993  NA

amcdavid · 2020-04-09T20:41:11Z

Not sure. If you provide a reproducible example and output from BiocManager::valid() I can take a look.

gfinak · 2020-11-19T16:51:39Z

I believe this is related to #143 and was fixed recently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up fitting process when including random effect with `lme4`? #130

Speed up fitting process when including random effect with `lme4`? #130

hoholee commented Apr 8, 2020 •

edited

Loading

amcdavid commented Apr 9, 2020

hoholee commented Apr 9, 2020

hoholee commented Apr 9, 2020

amcdavid commented Apr 9, 2020

gfinak commented Nov 19, 2020

Speed up fitting process when including random effect with lme4? #130

Speed up fitting process when including random effect with lme4? #130

Comments

hoholee commented Apr 8, 2020 • edited Loading

amcdavid commented Apr 9, 2020

hoholee commented Apr 9, 2020

hoholee commented Apr 9, 2020

amcdavid commented Apr 9, 2020

gfinak commented Nov 19, 2020

Speed up fitting process when including random effect with `lme4`? #130

Speed up fitting process when including random effect with `lme4`? #130

hoholee commented Apr 8, 2020 •

edited

Loading