You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But because I kept getting very large odds ratios and very small p-values I got suspicious. Therefore I tested the first 10 genes of the 2019 Human WIkiPathway NRF2 pathway WP2884 using a background of the 20 first genes in the gene set. The result can be found here. Ass seen from the Notebook the odds ratio for the is NRF2 pathway WP2884 is calculated to be Inf and the p-value is 6.56e-32. That does not seem like it should be the case if the background was considered?
Did I input the genes wrongly or something similar?
The text was updated successfully, but these errors were encountered:
Inf odds ratios are possible if the input gene set is a subset of a gene set in the gene set library. This is due to the formula given a contingency table:
a b
c d
odds ratio = ad/bc
in case of a subset bc will be 0. In the Enrichr code this is handled by dividing by max(1, bc), which will result in a very large value.
Not sure if there are some other issues with the background correction, though.
I think the problem is more illustrated by the p-value. With the dataset I mention above the fisher.test would (in Rcode) look something like:
m1 <- matrix(c(0,0,10,10), ncol = 2, byrow = F)
broom::tidy( fisher.test(m1) )
estimate p.value conf.low conf.high method alternative
0 1 0 Inf Fisher's Exact Test for Count Data two.sided
If on the other hand the background was not used you would end up with something like:
m2 <- matrix(c(2e4,0,10,10), ncol = 2, byrow = F)
broom::tidy( fisher.test(m2) )
estimate p.value conf.low conf.high method alternative
Inf 6.50e-32 3103. Inf Fisher's Exact Test for Count Data two.sided
I've tested this with some of my real data and the odds ratio and p-values reported by Independent Enrichment Analysis is very similar to what I get when using a fisher test with all known genes as background instead of the provided subset.
You can also see this by running the example dataset you provide as both foreground and background (Appyter found here). There I still get very significant results with very high OR even though there should be no enrichment.
u8sand
changed the title
Bug in background of Independent Enrichment Analysis?
[Independent Enrichment Analysis] Bug in background?
Jul 27, 2022
I've recently started using your Appyter for Independent Enrichment Analysis to analyze the Enrichr catalog with a costume background.
But because I kept getting very large odds ratios and very small p-values I got suspicious. Therefore I tested the first 10 genes of the 2019 Human WIkiPathway
NRF2 pathway WP2884
using a background of the 20 first genes in the gene set. The result can be found here. Ass seen from the Notebook the odds ratio for the isNRF2 pathway WP2884
is calculated to beInf
and the p-value is6.56e-32
. That does not seem like it should be the case if the background was considered?Did I input the genes wrongly or something similar?
The text was updated successfully, but these errors were encountered: