Why scaling the smaller dataset to the same depth as the larger dataset results in more false positive peaks. #694
bentyeh
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In the original 2008 MACS paper by Zhang et al., the authors write
This is actually to be expected given MACS's Poisson model. I did not find a thorough explanation elsewhere (GitHub Issues, GitHub Discussions, or MACS Google Group), so I'm posting this here in case anyone else finds it useful. (Or if my understanding is incorrect, please let me know!)
Let$X$ be the coverage at some peak in the ChIP track. Let $\lambda$ be the coverage at that peak in the control track. Let $f$ be the sequencing depth ratio between samples, i.e., total number of reads in the ChIP sample / total number of reads in the control sample. If scaling the ChIP track to match the control track, then the scaled coverage ratio at the peak is $r = (X / f) / \lambda = X / (\lambda f)$ . If scaling the control track to match the ChIP track, then the ratio is $r = X / (\lambda f)$ . The scaled ratio is therefore identical regardless of which direction the scaling is performed.
For a constant ratio of a sample value to the mean of a Poisson distribution, the p-value (or 1 - CDF) decreases as the mean increases. While this can be observed by simulation, it can also be intuitively understood as follows:
Consequently, using a higher control coverage value$\lambda$ (whether by scaling up a low-depth control to match a high-depth ChIP sample, or by scaling up a low-depth ChIP sample to match a high-depth control sample) results in lower p-values for all peaks and therefore more peaks passing cutoff.
Beta Was this translation helpful? Give feedback.
All reactions