Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

m6A sites identified on transcript basis #377

Open
baibhav-bioinfo opened this issue Feb 19, 2025 · 3 comments
Open

m6A sites identified on transcript basis #377

baibhav-bioinfo opened this issue Feb 19, 2025 · 3 comments
Labels
question Looking for clarification on inputs and/or outputs

Comments

@baibhav-bioinfo
Copy link

Hello,
I am using DRS dataset to identify m6A sites in plant dataset using dorado+modkit tools.

If i want to get the m6A sites identified on the transcript locations, can i align the unaligned basecalled output bam files to the transcript.fasta instead of genome.fasta. and then use modkit to get bedmethyl files of m6A sites and do downstream analysis.

and then later find out the genomic locations of the sites for representation and visualisation etc.

@ArtRand
Copy link
Contributor

ArtRand commented Feb 20, 2025

Hello @baibhav-bioinfo,

Two options I can think of off-the-top are:

  1. Align to the transcriptome and the genome, then use their pileups for individual tasks.
  2. Align to the genome, then make a BED of the transcripts and use the --include-bed argument to modkit pileup so that you only have m6A positions included. Of course, you'll have to make sure to get the BED file right so that you get the isoforms you want.

You could convert transcript coordinates to genome coordinates with LiftOver or a similar tool, but Modkit doesn't have any such functionality.

@ArtRand ArtRand added the question Looking for clarification on inputs and/or outputs label Feb 20, 2025
@baibhav-bioinfo
Copy link
Author

baibhav-bioinfo commented Feb 25, 2025

hi,
i tried the alignment with the transcriptome.
Then i made bedmethyl file using modkit and now i see that every site is in + strand.

why is that?.....when i did align to genome earlier i was getting half and half on both the strands.

is there any particular reason for that?
(pPS: for alignment i have selected only full length reads which align to reference transcriptome)

@baibhav-bioinfo
Copy link
Author

hi @ArtRand
I think i will choose the second option from what you suggested (as the mapping to transcriptome is too much confusing)

so i will map the reads to genome and then while modkit pileup i will provide the bed file of transcripts (--include-bed option) to get the m6A sites only in those regions.
i will use the gtf file to get the transcripts.bed file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Looking for clarification on inputs and/or outputs
Projects
None yet
Development

No branches or pull requests

2 participants