data matrix, annotation and cds problems #16

BeatrizdeToledo · 2020-11-27T15:57:26Z

Good afternoon,
I have been trying to use tappas, but I noticed some things that i was not expecting. I used SQANTI3 that generated the GFF3 input file.

1 - To generate my data matrix I used RSEM+STAR and used the read count column, that is not normalized. I selected to perform TMM normalization. However, the data matrix that appears in TAPPAS is exactly like the input data matrix, and seems it was not TMM normalized. Is this expected to happen? Should I perform any type of normalization by myself?

2 - I observed many cases in which two isoforms were very similar, and I would expect to observe the same features in both of them. But only one of the isoforms had the features. And the other isoform had 0 features. In many cases this occur in "novel transcripts". Does this annotation relies only in database information and therefore novel isoforms will not have any information?

3 - I also observed situation that, although isoforms are very similar, only one of the isoforms had the features, and neither of them are novel isoforms. Is it possible that the database used for the annotation is missing information?

4 - Many times my tappas interface closes when I change to another tab (like like a power point, or firefox). On the terminal it appers this message: libc++abi.dylib: terminating with uncaught exception of type NSException
Abort trap: 6
Is there something I can do to avoid this?

5 - I observed that transcripts that have the same cds, they are not "merged" in DIU CDS analyses. Basically the DIU CDS analyses will exclude non-protein coding transcripts. I would expected that transcripts with the same CDS would be merged into one "CDS-transcript"

aarzalluz · 2020-12-02T12:14:55Z

Hi @BeatrizdeToledo,

I'm sorry to hear you're experiencing issues with tappAS. We might need some additional info to look into some of them (see below), but I'll try to answer as best as I can.

We've never had a report that TMM normalization was not working correctly. I'm not sure whether you have compared all values using a formula and found out that no normalization has been applied, or if you detected an issue in the project creation log? At any rate, you're welcome to try and perform TMM normalization outside of tappAS and create the project excluding the normalization step. Doing this should be equivalent to tappAS-implemented TMM normalization, but if you see any different results, please let us know.
(and 3) The functional annotations in long read-generated transcriptomes output by SQANTI3 are generated using the IsoAnnotLite script, which relies on a previously-generated, tappAS-compatible annotation as template to transfer the functional features onto the new transcriptome, which you've surely provided as an input to SQANTI3. My colleague @psalguerog has been working on improving the algorithm, but we are aware that it has limitations. In particular, this issue with missing features in both novel and non-novel isoforms has been reported before. While we're trying to solve it, I cannot provide an exact date for when a new, improved version of IsoAnnotLite will be released.

*Of note, we are currently working on a scalable, standardized pipeline for de novo functional annotation of isoforms, but it is not ready for release yet (see issue #12).

Thank you for reporting this issue -we have not experienced it before, but will look into it and try to solve it. In order to best address it, could you tell me which OS (Mac, Windows, Ubuntu?) you're working with, and which exact version? Also, which version of Java 8 are you running tappAS under?
I believe this may have to do with what I wrote related to questions 2 and 3. However, just to clarify, tappAS CDS DIU analysis (or DCU analysis) does exclude non-coding transcripts, but it should detect cases where several transcripts have the same CDS, and aggregate their expression when estimating CDS-level counts.

I hope this helps,

Ángeles

aarzalluz · 2021-01-14T12:24:12Z

Hi @BeatrizdeToledo,

A new version of IsoAnnotLite is now available in our website. Feel free to try it out and see if your annotation problems are solved after annotating your isoforms using the new script -we have specifically worked on improving some of the issues that you reported here.

If so, do let us know! Feedback is really useful for us to keep our software up and running.

Best,

Ángeles

BeatrizdeToledo · 2021-01-18T11:03:41Z

That's great!
Should I use the sqanti files after the filtering step or the one after the classification step, without the filtering?
Thank you!

aarzalluz · 2021-01-18T12:03:50Z

We always recommend using the filtered output files because it's the best way to ensure the removal of artifacts/false positive isoforms, and therefore the quality of your transcriptome. As for compatiblity, the post-filter file shouldn't give you any problems when using IsoAnnotLite -if they do, let me know and we'll look into it!

BeatrizdeToledo · 2021-01-18T12:51:41Z

I tried to use the filtered sqanti output files, but it gave me an error

Running IsoAnnot Lite 2.0...

Reading SQANTI 3 Files and creating an auxiliar GFF...
Reading reference annotation file and creating data variables...
Transforming CDS local positions to genomic position...
Transforming feature local positions to genomic position in GFF3...
Generating Transcriptome per each gene.....
Mapping transcript features betweeen GFFs...

    ?Not annoted a total of 100.00 % (11882) of transcripts because they do not have gene information in the classification file.
    ?Not annoted a total of 0.00 % (0) of novel transcripts because they do not have information in the GFF3 file.
    ?Recovered annotation for a total of 0.00 % (0) of novel transcripts.

    ?Annoted a total of 0 annotation features from reference GFF3 file.

Traceback (most recent call last):
File "IsoAnnotLite_v2.0_SQ3.py", line 1908, in
main()
File "IsoAnnotLite_v2.0_SQ3.py", line 1785, in main
run(args)
File "IsoAnnotLite_v2.0_SQ3.py", line 1846, in run
mappingFeatures(dc_SQexons, dc_SQcoding, dc_SQtransGene, dc_SQgeneTrans, dc_SQstrand, dc_GFF3exonsTrans, dc_GFF3transExons, dc_GFF3_Genomic, dc_GFF3Gene_Genomic, dc_GFF3coding, dc_GFF3geneTrans, filename) #edit tappAS_annotation_from_Sqanti file
File "IsoAnnotLite_v2.0_SQ3.py", line 1314, in mappingFeatures
perct = featuresAnnotated/totalAnotations*100
ZeroDivisionError: division by zero

When I used the unfiltered files it geve me this
Running IsoAnnot Lite 2.0...

Reading SQANTI 3 Files and creating an auxiliar GFF...
Reading reference annotation file and creating data variables...
Transforming CDS local positions to genomic position...
Transforming feature local positions to genomic position in GFF3...
Generating Transcriptome per each gene.....
Mapping transcript features betweeen GFFs...

    51.70 % of transcripts annotated...

    ?Not annoted a total of 0.00 % (0) of transcripts because they do not have gene information in the classification file.
    ?Not annoted a total of 48.30 % (99977) of novel transcripts because they do not have information in the GFF3 file.
    ?Recovered annotation for a total of 0.00 % (0) of novel transcripts.

    ?Annoted a total of 4407709 annotation features from reference GFF3 file.
    ?Annoted a total of 65.71 % of the reference GFF3 file annotations.

Adding extra information to GFF3 columns...
Reading GFF3 to sort it correctly...
Generating final GFF3...
Time used to generate new GFF3: 366.90 seconds.

Exportation complete.
Your GFF3 result is: 'tappAS_annot_from_SQANTI3.gff3'

**The head of my filtered classification is: **
isoform chrom strand length exons structural_category associated_gene associated_transcript ref_length ref_exons diff_to_TSS diff_to_TTS diff_to_gene_TSS diff_to_gene_TTS subcategory RTS_stage all_canonical min_sample_cov min_cov min_cov_pos sd_cov FL n_indels n_indels_junc bite iso_exp gene_exp ratio_exp FSM_class coding ORF_length CDS_length CDS_start CDS_end CDS_genomic_start CDS_genomic_end predicted_NMD perc_A_downstream_TTS seq_A_downstream_TTS dist_to_cage_peak within_cage_peak dist_to_polya_site within_polya_site polyA_motif polyA_dist FL.bc1001_5p--bc1001_3p FL.bc1002_5p--bc1002_3p FL.bc1003_5p--bc1003_3p
PB.1000.1 1 - 4217 1 full-splice_match ENSMUSG00000040265 ENSMUST00000193110 4500 1 4494 -4211 4494 -2 mono-exon FALSE NA NA NA NA NA NA NA NA NA 0.8544444444444443 6.0377777777777775 0.1415163783584836 C non_coding NA NA NA NA NA NA NA 25.0 AACCCCCAATTCCTCATTCT NA False NA NA AATAAA -17 0 0 2
PB.1000.2 1 - 5714 8 novel_not_in_catalog ENSMUSG00000040265 novel 4601 20 NA NA 63864 -2 at_least_one_novel_splicesite FALSE canonical 2 92 junction_4 29.461147018971776 NA NA NA TRUE 0.0 6.0377777777777775 0.0 C coding 304 915 82 996 162134548 161992022 FALSE 25.0 AACCCCCAATTCCTCATTCT NA False NA NA AATAAA -17 0 0 2

So it contains the gene information. Do you have any advices?

aarzalluz · 2021-01-18T13:03:35Z

Hi @BeatrizdeToledo,

Thank you for reporting this. From your log, it seems that the sqanti3_RulesFilter.py script from SQANTI3 modifies the original _classification.txt file in a way that removes/changes gene information and prevents running IsoAnnotLite using the filtered outputs.

I'm going to include my colleagues @FJPardoPalacios @psalguerog in the issue so that they are aware of it and can work on solving the problem as soon as possible.

In the meantime, I suggest that you manually filter your original _classification.txt file to keep only the isoforms present in the filtered output generated by sqanti3_RulesFilter.py , and run IsoAnnotLite using it. That should do the trick!

Ángeles

BeatrizdeToledo · 2021-01-18T13:25:20Z

Thank you for your quick reply,
Additionally I previously mentioned that my tappas would close abruptly. This time happened when I was trying to create a project, and opened another program. This is the message that appears in unix

added log txt(1): 5:20 - Returning 52921 transcript expression data rows after filtering and normalization.

2021-01-18 05:20:44.401 java[55454:8433458] unrecognized type is 4294967295
2021-01-18 05:20:44.401 java[55454:8433458] *** Assertion failure in -[NSEvent _initWithCGEvent:eventRef:], /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/AppKit/AppKit-1894.60.100/AppKit.subproj/NSEvent.m:1960
2021-01-18 05:20:44.404 java[55454:8433458] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'Invalid parameter not satisfying: _type > 0 && _type <= kCGSLastEventType'
*** First throw call stack:
(
0 CoreFoundation 0x00007fff38733b57 __exceptionPreprocess + 250
1 libobjc.A.dylib 0x00007fff713e65bf objc_exception_throw + 48
2 CoreFoundation 0x00007fff3875cd08 +[NSException raise:format:arguments:] + 88
3 Foundation 0x00007fff3ae4ee9d -[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] + 191
4 AppKit 0x00007fff35acab0f -[NSEvent _initWithCGEvent:eventRef:] + 2951
5 AppKit 0x00007fff35c7e595 +[NSEvent eventWithCGEvent:] + 106
6 libglass.dylib 0x000000011edd717b listenTouchEvents + 59
7 SkyLight 0x00007fff67864a0a _ZL19processEventTapDataPvjjjPhj + 157
8 SkyLight 0x00007fff679c6b2e _XPostEventTapData + 277
9 SkyLight 0x00007fff6786490f ZL22eventTapMessageHandlerP12__CFMachPortPvlS1 + 147
10 CoreFoundation 0x00007fff386e6b05 __CFMachPortPerform + 250
11 CoreFoundation 0x00007fff386b8304 CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION + 41
12 CoreFoundation 0x00007fff386b8250 __CFRunLoopDoSource1 + 541
13 CoreFoundation 0x00007fff386b6d79 __CFRunLoopRun + 2270
14 CoreFoundation 0x00007fff386b5e3e CFRunLoopRunSpecific + 462
15 Foundation 0x00007fff3ad511c8 -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 212
16 libglass.dylib 0x000000011edc4ad5 +[GlassApplication enterNestedEventLoopWithEnv:] + 165
17 libglass.dylib 0x000000011edc551a Java_com_sun_glass_ui_mac_MacApplication__1enterNestedEventLoopImpl + 74
18 ??? 0x0000000109452667 0x0 + 4450494055
)
libc++abi.dylib: terminating with uncaught exception of type NSException
Abort trap: 6

aarzalluz · 2021-01-19T10:42:30Z

Hi @BeatrizdeToledo,

According to this issue in the SQANTI3 repository, it seems that IsoAnnotLite works correctly after filtering with sqanti3_RulesFilter.py. There must be some issue with the formatting of your inputs, most likely missing gene information, judging by the printed outputs you sent -would you be willing to share your files with @psalguerog and myself so that we can troubleshoot?

As for the issue where tappAS closes unexpectedly, we have it in mind and will get to it as soon as we can.

Best,

Ángeles

BeatrizdeToledo · 2021-01-21T15:10:51Z

Hi,
I ended up filtering manually as you suggested before.
There is also another problem that happens quite often with me.

Many times when I ran FEAanalysis and other type of analysis, the task fails. And I get this type of message. Today I was trying to do the fea analysis using the sqanti filtered gff file. but I requested to not filter by low count or high variance, cause I wanted all the transcripts/genes detected by pacbio to be used as background list. I noticed that when you filter by low count and high variance, the filtered genes/transcripts are not used anymore in the background list.

FEAnalysis failed - task aborted. Exception: GC overhead limit exceeded

aarzalluz · 2021-01-21T16:16:35Z

Hi @BeatrizdeToledo,

I'm glad that you could find a way to use IsoAnnotLite with your filtered transcriptome. I therefore understand that your output GFF3 is now correctly formatted and that you're not having any additional issues with it when loading it on tappAS.

As for the FEA error, it would be best to open a separate issue for it, with a specific title that summarizes your problem well.

As a general rule, we recommend opening dedicated issues for different software problems and to try to add descriptive titles. This makes it easier for future/current users of the tool to find solutions to their problems, to add to the discussion (for instance if they've run into a similar problem), and for us to keep track and solve them.

Since the main topic of this issue, i.e. IsoAnnotLite annotation problems, has been solved, I will close it and wait for you to open a new one regarding FEA. You're welcome to do the same for any pending problem, such as the issue that you reported with tappAS closing unexpectedly.

Best,

Ángeles

aarzalluz mentioned this issue Jan 18, 2021

Add --isoAnnotLite flag to sqanti3_RulesFilter.py? ConesaLab/SQANTI3#30

Closed

aarzalluz closed this as completed Jan 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data matrix, annotation and cds problems #16

data matrix, annotation and cds problems #16

BeatrizdeToledo commented Nov 27, 2020

aarzalluz commented Dec 2, 2020 •

edited

Loading

aarzalluz commented Jan 14, 2021

BeatrizdeToledo commented Jan 18, 2021

aarzalluz commented Jan 18, 2021

BeatrizdeToledo commented Jan 18, 2021

aarzalluz commented Jan 18, 2021 •

edited

Loading

BeatrizdeToledo commented Jan 18, 2021

aarzalluz commented Jan 19, 2021

BeatrizdeToledo commented Jan 21, 2021

aarzalluz commented Jan 21, 2021 •

edited

Loading

data matrix, annotation and cds problems #16

data matrix, annotation and cds problems #16

Comments

BeatrizdeToledo commented Nov 27, 2020

aarzalluz commented Dec 2, 2020 • edited Loading

aarzalluz commented Jan 14, 2021

BeatrizdeToledo commented Jan 18, 2021

aarzalluz commented Jan 18, 2021

BeatrizdeToledo commented Jan 18, 2021

aarzalluz commented Jan 18, 2021 • edited Loading

BeatrizdeToledo commented Jan 18, 2021

aarzalluz commented Jan 19, 2021

BeatrizdeToledo commented Jan 21, 2021

aarzalluz commented Jan 21, 2021 • edited Loading

aarzalluz commented Dec 2, 2020 •

edited

Loading

aarzalluz commented Jan 18, 2021 •

edited

Loading

aarzalluz commented Jan 21, 2021 •

edited

Loading