Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

understanding the dup isoforms in the classification file #216

Open
alexyfyf opened this issue Aug 23, 2023 · 2 comments
Open

understanding the dup isoforms in the classification file #216

alexyfyf opened this issue Aug 23, 2023 · 2 comments
Labels
QC SQ3 Quality Control related issues question Further information is requested

Comments

@alexyfyf
Copy link

alexyfyf commented Aug 23, 2023

Hi team,

I have recently run sqanti3 with fasta as input, and the result table contains some duplicate isoforms.
I had a quick check and found it stems from the corrected.gtf file first, which seems been generated after aligning with minimap2, the isoform is splitted because of supplement alignment.

I did not find much detail about that, could you explain a bit? Does it make sense to consider them separately, because sometimes the dup is far away say in a different chromosome, but sometimes quite close.

I can give an example here:
In same chromosome (430kb apart)
Screen Shot 2023-08-23 at 13 57 59
Cross chromosome:
Screen Shot 2023-08-23 at 13 59 16

Cheers,
Alex

@aarzalluz aarzalluz added the question Further information is requested label Aug 23, 2023
@aarzalluz
Copy link
Member

Hi @alexyfyf,

I am not sure why this is happening (SQANTI3 runs minimap2 with --secondary=no, but I guess there still could be some supplementary alignemtns), have you tried any of the other implemented mappers?

If you want more control over the process, I would suggest mapping isoforms outside of SQANTI3, using more stringent parameters or filtering supplementary alignments, and then run the QC script using the recommended GTF input.

Best,

Ángeles

@aarzalluz aarzalluz added the QC SQ3 Quality Control related issues label Aug 23, 2023
@alexyfyf
Copy link
Author

alexyfyf commented Aug 23, 2023

Thank you for you prompt reply and suggestion. I think setting secondary=no does not prevent the supplementary alignment. I will try to process them outside SQANTI. Also from my point of view, these should be considered fusion genes.

However, I do see fusion genes in SQANTI3 classification output, but it is always two genes nearby that are chained together. I am also wondering if that should be the appropriate interpretation. To me, they are more likely to be either read-through transcripts or due to overlap of gene annotations. BTW, I use gencode as suggested in your wiki.

Screen Shot 2023-08-23 at 21 54 55

I also ran gencode gtf into SQANTI3, and it does classify some transcripts as fusion as a mistake.
The first one looks like a readthrough, the second one is because GLYATL1 (ENST00000534063) and GLYATL1P4 (ENST00000529326) have a shared exon.
Screen Shot 2023-08-23 at 21 59 46

I'd like some suggestions about how to understand this.

Cheers,
Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
QC SQ3 Quality Control related issues question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants