Issue in Inosine detection #373

Salvobioinfo · 2025-02-10T20:34:45Z

We created an ADAR KO cell line, meaning no inosine should be detected in RNA. This expectation, along with the reliability of our knockout, was confirmed by Illumina sequencing. We then performed nanopore sequencing on the same group of samples. I basecalled our library using Dorado 0.8 with the hac,ionosine_m6A model. As suggested by the Modkit manual, I ran sample-probs to fine-tune the filtering threshold for inosine and m6A detection. However, when I used the output file to generate a density plot of the total counts at each probability level, I was surprised to find no significant differences between ADAR KO and control samples.

I also attempted to use the inosine sites identified from Illumina sequencing as ground truth, but this approach resulted in many false negative in CTRLs and false positives in KOs . Is there any planned solution to address this issue? Thanks in advance.

ArtRand · 2025-02-12T22:21:13Z

Hello @Salvobioinfo,

However, when I used the output file to generate a density plot of the total counts at each probability level, I was surprised to find no significant differences between ADAR KO and control samples.

A couple things.

(1) Distributions that look like this, where there is a downward sloping line from the left (which I'm assuming is the density of low confidence calls) usually indicates that a lot of the probabilities in the plot are due to false positives. If you look at just the frequency of very high confidence Inosine calls do you see much of a difference between the KO and Ctrl?

(2) What is the expected frequency of Inosine in your samples, roughly? It appears that the levels are close to the false positive rate of the model at a global level. But that may not be the case. Since you have orthogonal data, what levels do you expect?

I also attempted to use the inosine sites identified from Illumina sequencing as ground truth, but this approach resulted in many false negative in CTRLs and false positives in KOs . Is there any planned solution to address this issue? Thanks in advance.

How many FNs and FPs did you get? Could you use modkit validate to check?

Salvobioinfo · 2025-02-14T09:12:40Z

Hello @ArtRand

Hello @Salvobioinfo,

However, when I used the output file to generate a density plot of the total counts at each probability level, I was surprised to find no significant differences between ADAR KO and control samples.

A couple things.

(1) Distributions that look like this, where there is a downward sloping line from the left (which I'm assuming is the density of low confidence calls) usually indicates that a lot of the probabilities in the plot are due to false positives. If you look at just the frequency of very high confidence Inosine calls do you see much of a difference between the KO and Ctrl?

(Using 0.99) Approximately 163 sites differ between KO and CTRL. Given that I have triplicates for each sample condition, I consider an editing site as valid only if it is detected in at least 2 out of 3 replicates. Additionally, the same sites should be well covered in KO to ensure reliability.

(2) What is the expected frequency of Inosine in your samples, roughly? It appears that the levels are close to the false positive rate of the model at a global level. But that may not be the case. Since you have orthogonal data, what levels do you expect?

Since we are discussing physiological RNA editing, inosine frequency typically ranges between 5% and 15% in my cell lines under steady-state conditions. After treatment, it increases to 10%–30%, with some sites reaching 40%–50%.

Both C→U and A→I physiological modifications generally occur at low frequencies. This makes me question the validity of the A→I detection model, especially if it hasn't been trained on proper biological samples and is instead based on modified oligos (I suppose). I’m not sure how reliable its claims are in this context. From Illumina sequencing, approximately 4,000 editing sites have been detected. Of course, I don't expect a perfect overlap due to a series of technical factors, including the huge difference in coverage, as well as several other methodological differences.

I also attempted to use the inosine sites identified from Illumina sequencing as ground truth, but this approach resulted in many false negative in CTRLs and false positives in KOs . Is there any planned solution to address this issue? Thanks in advance.

How many FNs and FPs did you get? Could you use modkit validate to check?

Yes I could. 👍🏻👍🏻

ArtRand · 2025-02-14T18:50:17Z

Hello @Salvobioinfo,

(Using 0.99) Approximately 163 sites differ between KO and CTRL. Given that I have triplicates for each sample condition, I consider an editing site as valid only if it is detected in at least 2 out of 3 replicates. Additionally, the same sites should be well covered in KO to ensure reliability.

Are you looking at the percent modified column in the pileup bedMethyls? In general, I would recommend using the bedMethyl for looking for changes in modifications at specific positions, seems like you're already doing this. When you were looking at the sample-probs output before it got me thinking that you're looking for read-level changes that might not all concentrate on a specific reference position. You can also use dmr pair to perform comparisons at reference positions.

Since we are discussing physiological RNA editing, inosine frequency typically ranges between 5% and 15% in my cell lines under steady-state conditions. After treatment, it increases to 10%–30%, with some sites reaching 40%–50%.

For changes in the order of 5-15% you will probably need relatively high coverage to know that a site is different between the two samples/conditions. The effect size model describes some of the intuition.

I may have led you down the wrong path with modkit validate, the assumption with that command is that there are sites that are known to be completely one modification state. If your ILMN data shows that there are sites with 30% A->I editing, you could label it as "I" and expect that the accuracy is ~30%, but I don't know if that helps you get to your research question. On the other hand, if you ILMN data suggests that a site is entirely Inosine - the command should work as intended.

ArtRand added the question Looking for clarification on inputs and/or outputs label Feb 12, 2025

Salvobioinfo closed this as completed Feb 14, 2025

Salvobioinfo reopened this Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue in Inosine detection #373

Issue in Inosine detection #373

Salvobioinfo commented Feb 10, 2025 •

edited

Loading

ArtRand commented Feb 12, 2025

Salvobioinfo commented Feb 14, 2025 •

edited

Loading

ArtRand commented Feb 14, 2025

Issue in Inosine detection #373

Issue in Inosine detection #373

Comments

Salvobioinfo commented Feb 10, 2025 • edited Loading

ArtRand commented Feb 12, 2025

Salvobioinfo commented Feb 14, 2025 • edited Loading

ArtRand commented Feb 14, 2025

Salvobioinfo commented Feb 10, 2025 •

edited

Loading

Salvobioinfo commented Feb 14, 2025 •

edited

Loading