-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preprocessNoob using incorrect row indices to Meth and Unmeth matrices #264
Comments
I created a proof of concept PR. I think a similar issue may be affecting FunNorm as well. I haven't looked at other normalization methods. |
This could be a serious bug. However, I am not sure it is. Since I haven't yet had time to investigate, perhaps you could do some leg work. The code - as currently written - assumes that the output of
If this mismatch is happening you should be able to see it by taking the default objects in the |
The issue actually arises when the manifest is smaller than the MSet (not bigger). We observe EPIC v1 idats in the wild that have more probes than what is contained in the B4/B5 EPIC manifest. I have already looked at the minfiData (450k as I remember) and it has the same number of probes as the manifest, so we won't be able to replicate the issue with this test data. But you can see example output from one of our datasets below.
I've done some comparisons of my PR with the latest devel branch. I have taken an RGSet with 866091 probes and made a copy of the RGSet that is subset to the bead addresses found in the B5 manifest (excluding the manufacture change probes). I then ran preprocessNoob() on both RGSets using both the official version and my fixed version of minfi. When comparing the Noob-adjusted betas for the two RGSets, the official version produces a lot of discordance while the differences in the fixed version of minfi are negligible (see attached plots). Excluding a few hundred out-of-band signals from the background estimate shouldn't make much of a difference, so I suspect that the discordance seen in the official version of minfi is because changing the number of rows in the MSet affects which probes get Noob-adjusted. |
Oh, that is interesting. I could imagine we have an assumption that the
MSet is always a subset of the Manifest. I could see the potential for a
bug here.
The right fix however, is to deal with this inside of `getAnnotation` to
make sure the dimensions match up.
Could you share a couple of samples from your dataset, so any fixes could
be checked.
…On Thu, Feb 29, 2024 at 1:07 PM Jonathon LeFaive ***@***.***> wrote:
You're right in saying that a full manifest object can be much bigger than
the size of the MSet.
The issue actually arises when the manifest is smaller than the MSet (not
bigger). We observe EPIC v1 idats in the wild that have more probes than
what is contained in the B4/B5 EPIC manifest. I have already looked at the
minfiData (450k as I remember) and it has the same number of probes as the
manifest, so we won't be able to replicate the issue with this test data.
But you can see example output from one of our datasets below.
> MSet <- preprocessRaw(rgset)
Loading required package: IlluminaHumanMethylationEPICmanifest
> dim(MSet)
[1] 866091 95
> probe.type <- getProbeType(MSet, withColor = TRUE)
Loading required package: IlluminaHumanMethylationEPICanno.ilm10b4.hg19
> length(probe.type)
[1] 865859
I've done some comparisons of my PR with the latest devel branch. I have
taken an RGSet with 866091 probes and made a copy of the RGSet that is
subset to the bead addresses found in the B5 manifest (excluding the
manufacture change probes). I then ran preprocessNoob() on both RGSets
using both the official version and my fixed version of minfi. When
comparing the Noob-adjusted betas for the two RGSets, the official version
produces a lot of discordance while the differences in the fixed version of
minfi are negligible (see attached plots). Excluding a few hundred
out-of-band signals from the background estimate shouldn't make much of a
difference, so I suspect that the discordance seen in the official version
of minfi is because changing the number of rows in the MSet affects which
probes get Noob-adjusted.
minfi_noob_plots.png (view on web)
<https://github.com/hansenlab/minfi/assets/4069894/337496b6-f143-47ea-a4da-4614d3058e83>
—
Reply to this email directly, view it on GitHub
<#264 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABF2DH6IDMLVMN5XHZQZ563YV5W47AVCNFSM6AAAAABDZNWLW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZRGY4DEMBVGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Best,
Kasper
|
I cannot share samples from our data but I could share the bead address list. I'm assuming you can generate an RGChannelSet object with fake (random values) red/green matrices using these addresses. Would that be useful to you? |
Perhaps you could generate such a simulated RGChannelSet or perhaps - instead of simulating - you just have integer values in the channels representing different types of probes. That would be useful. I am still surprised by this, because the called to |
I have created a test case with a couple samples from GEO. Another thing I discovered is that when using Running
Note: I had to use LZMA to compress this tar archive because the gzip version was too big for Github. |
I just realized that this is because the set of probes that overlap between these two samples are all in the manifest. So if the intersection of probes across all samples are in the manifiest, then we don't have a problem. The issue can still exist if the idats have different sizes as long as the intersection of probes in the idats is larger than the manifest.
|
I can replicate the issue. The core of the issue is because of the (in hindsight wrong) to have separate Manifest and Annotation packages. This means that these packages may not be in sync. The issues happens when you use This affects output from I need to do a more careful analysis of the potential impact of this bug, including a clear description (for users) of how this could happen with various types of IDAT files and array versions. It might be nice to have a have a call with the OP with some more details and their observations. For this reason, I am waiting on merging the pull request. |
Is this still waiting on a call with the OP? |
minfi/R/preprocessNoob.R
Lines 386 to 392 in 77d3345
The preprocessNoob function uses indices from the
getProbeType()
function to select probes by type using their row index (see code linked above). The problem is thatgetProbeType()
returns a vector based on the manifest which is unlikely to have the same length as the number of probes in the MethylSet. For example, the indices ind2.probes <- which(probe.type == "II")
do not map to indices for the same probes in the return matrices fromgetMeth(MSet)
andgetUnmeth(MSet)
, but are in fact used to select the signals for type II probes.I suspect that fixing this bug will improve the effectiveness of ssNoob in minfi.
The text was updated successfully, but these errors were encountered: