-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue in Inosine detection #373
Comments
Hello @Salvobioinfo,
A couple things. (1) Distributions that look like this, where there is a downward sloping line from the left (which I'm assuming is the density of low confidence calls) usually indicates that a lot of the probabilities in the plot are due to false positives. If you look at just the frequency of very high confidence Inosine calls do you see much of a difference between the KO and Ctrl? (2) What is the expected frequency of Inosine in your samples, roughly? It appears that the levels are close to the false positive rate of the model at a global level. But that may not be the case. Since you have orthogonal data, what levels do you expect?
How many FNs and FPs did you get? Could you use |
Hello @ArtRand
(Using 0.99) Approximately 163 sites differ between KO and CTRL. Given that I have triplicates for each sample condition, I consider an editing site as valid only if it is detected in at least 2 out of 3 replicates. Additionally, the same sites should be well covered in KO to ensure reliability.
Since we are discussing physiological RNA editing, inosine frequency typically ranges between 5% and 15% in my cell lines under steady-state conditions. After treatment, it increases to 10%–30%, with some sites reaching 40%–50%. Both C→U and A→I physiological modifications generally occur at low frequencies. This makes me question the validity of the A→I detection model, especially if it hasn't been trained on proper biological samples and is instead based on modified oligos (I suppose). I’m not sure how reliable its claims are in this context. From Illumina sequencing, approximately 4,000 editing sites have been detected. Of course, I don't expect a perfect overlap due to a series of technical factors, including the huge difference in coverage, as well as several other methodological differences.
Yes I could. 👍🏻👍🏻 |
Hello @Salvobioinfo,
Are you looking at the percent modified column in the pileup bedMethyls? In general, I would recommend using the bedMethyl for looking for changes in modifications at specific positions, seems like you're already doing this. When you were looking at the sample-probs output before it got me thinking that you're looking for read-level changes that might not all concentrate on a specific reference position. You can also use
For changes in the order of 5-15% you will probably need relatively high coverage to know that a site is different between the two samples/conditions. The effect size model describes some of the intuition. I may have led you down the wrong path with |
We created an ADAR KO cell line, meaning no inosine should be detected in RNA. This expectation, along with the reliability of our knockout, was confirmed by Illumina sequencing. We then performed nanopore sequencing on the same group of samples. I basecalled our library using Dorado 0.8 with the hac,ionosine_m6A model. As suggested by the Modkit manual, I ran sample-probs to fine-tune the filtering threshold for inosine and m6A detection. However, when I used the output file to generate a density plot of the total counts at each probability level, I was surprised to find no significant differences between ADAR KO and control samples.
I also attempted to use the inosine sites identified from Illumina sequencing as ground truth, but this approach resulted in many false negative in CTRLs and false positives in KOs . Is there any planned solution to address this issue? Thanks in advance.
The text was updated successfully, but these errors were encountered: