[Independent Enrichment Analysis] Bug in background? #740

kvittingseerup · 2021-12-09T14:15:14Z

I've recently started using your Appyter for Independent Enrichment Analysis to analyze the Enrichr catalog with a costume background.

But because I kept getting very large odds ratios and very small p-values I got suspicious. Therefore I tested the first 10 genes of the 2019 Human WIkiPathway NRF2 pathway WP2884 using a background of the 20 first genes in the gene set. The result can be found here. Ass seen from the Notebook the odds ratio for the is NRF2 pathway WP2884 is calculated to be Inf and the p-value is 6.56e-32. That does not seem like it should be the case if the background was considered?

Did I input the genes wrongly or something similar?

The text was updated successfully, but these errors were encountered:

lachmann12 · 2021-12-10T02:10:19Z

Inf odds ratios are possible if the input gene set is a subset of a gene set in the gene set library. This is due to the formula given a contingency table:

a b
c d

odds ratio = ad/bc

in case of a subset bc will be 0. In the Enrichr code this is handled by dividing by max(1, bc), which will result in a very large value.

Not sure if there are some other issues with the background correction, though.

kvittingseerup · 2021-12-10T09:57:26Z

I think the problem is more illustrated by the p-value. With the dataset I mention above the fisher.test would (in Rcode) look something like:

m1 <- matrix(c(0,0,10,10), ncol = 2, byrow = F)
broom::tidy( fisher.test(m1) )
  estimate  p.value conf.low conf.high method                             alternative
         0       1        0       Inf Fisher's Exact Test for Count Data two.sided

If on the other hand the background was not used you would end up with something like:

m2 <- matrix(c(2e4,0,10,10), ncol = 2, byrow = F)
broom::tidy( fisher.test(m2) )
  estimate  p.value conf.low conf.high method                             alternative
       Inf 6.50e-32    3103.       Inf Fisher's Exact Test for Count Data two.sided

I've tested this with some of my real data and the odds ratio and p-values reported by Independent Enrichment Analysis is very similar to what I get when using a fisher test with all known genes as background instead of the provided subset.

You can also see this by running the example dataset you provide as both foreground and background (Appyter found here). There I still get very significant results with very high OR even though there should be no enrichment.

AviMaayan assigned mjjeon and lachmann12 Dec 9, 2021

u8sand changed the title ~~Bug in background of Independent Enrichment Analysis?~~ [Independent Enrichment Analysis] Bug in background? Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Independent Enrichment Analysis] Bug in background? #740

[Independent Enrichment Analysis] Bug in background? #740

kvittingseerup commented Dec 9, 2021 •

edited

Loading

lachmann12 commented Dec 10, 2021 •

edited

Loading

kvittingseerup commented Dec 10, 2021

[Independent Enrichment Analysis] Bug in background? #740

[Independent Enrichment Analysis] Bug in background? #740

Comments

kvittingseerup commented Dec 9, 2021 • edited Loading

lachmann12 commented Dec 10, 2021 • edited Loading

kvittingseerup commented Dec 10, 2021

kvittingseerup commented Dec 9, 2021 •

edited

Loading

lachmann12 commented Dec 10, 2021 •

edited

Loading