Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a non-ENSEMBL GFF3 file #12

Open
joelnitta opened this issue Sep 8, 2020 · 4 comments
Open

Using a non-ENSEMBL GFF3 file #12

joelnitta opened this issue Sep 8, 2020 · 4 comments
Labels
enhancement New feature or request IsoAnnotLite Bugs and questions related to IsoAnnotLite

Comments

@joelnitta
Copy link

My study organism (Daphnia magna) is not in ENSEMBL. I have a custom transcriptome from PacBio isoseq, and am interested in running the functional analysis in tappAS. Could you provide more documentation about how to prepare a tappAS GFF3 file in this situation? At a minimum, I would just like to know gene names and functions.

Alternatively that would be great if you could add this species to the reference GFF3 list. I don't know if you require ENSEMBL annotation for that, but it is in UniProt.

Thanks!

@aarzalluz
Copy link
Member

Hi @joelnitta,

If your organism is not already annotated in tappAS, you can use IsoAnnotLite to re-format your GTF file and make a tappAS-compatible GFF3 file. However, bear in mind that this file will not contain functional labels, and therefore that you'll only be able to run tappAS analysis from the Differential Module.

At the moment, we do not require for a transcriptome to be in ENSEMBL to annotate it or use it in tappAS -any organism can be potentially annotated at the functional level. However, adding functional labels is quite complex, and while at the moment we're working on having a robust pipeline for de novo transcriptome annotation, it is unfortunately not ready for public use yet. As you can see from the website link above, IsoAnnotLite contains an option to positionally transfer functional labels from an already-annotated GFF3 file, but cross-species usage is very likely to fail.

I apologize for the inconvenience, and will make sure that our team takes your request for a functional annotation for Daphnia magna into consideration for the future.

Ángeles

@joelnitta
Copy link
Author

joelnitta commented Sep 8, 2020

Thanks for the prompt reply Ángeles!

I am glad to hear that you are working on a pipeline for de novo transcriptome annotation, it sounds quite useful.

Sorry if this wasn't clear from my original post, but I already did the first part of your answer: I used IsoAnnotLite within SQANTI3 with the --isoAnnotLite flag to produce a tappAS-compatible GFF3 file, but without any functional annotations. I am interested in adding the functional annotations.

You say "adding functional labels is quite complex" -- my question is, can you provide any more details on this? I see in the paper that many different sources and software were used, but it's not clear to me how they were combined into the final GFF3 file. If the complexity of the problem makes it pointless to provide general documentation, that's fine. I will try to annotate the features I'm interested in, but I may come back with more specific questions, if you don't mind.

@aarzalluz
Copy link
Member

Hi Joel,

I see! So you already have the GFF3 output by --isoAnnotLite and want to add the functional annotations.

I refered to the complexity of this step because it requires parsing information from several databases (depending on the organism) and running different predictors. We gather information from InterProScan, UTRscan, ScanForMotifs, and a few other sources, all of which provide functional information in different formats that then needs to be integrated and reformatted. All of this can be (and previously has been, at least for current tappAS reference GFF3 files) done manually, but it is a very time-consuming process and requires advanced programming skills, which doesn't make it accessible for everybody interested in using tappAS. This why our lab is now putting together a series of scripts that will run these predictors for your transcriptome and convert this information into a tappAS-compatible GFF3, but as you can imagine, it is quite a complex pipeline to build.

Since this is a work in progress, and is unpublished, I cannot share these scripts or provide any more details about the pipeline for now -I hope you understand. That being said, if you are willing to try manually annotating some functional layers, I am of course happy to keep in touch and try to help you in any way I can -feel free to drop me an email and we can discuss any problems you may encounter.

Ángeles

@joelnitta
Copy link
Author

Thanks for your offer to help with this! I sent you an email with a link to my data files.

@aarzalluz aarzalluz added enhancement New feature or request IsoAnnotLite Bugs and questions related to IsoAnnotLite labels Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request IsoAnnotLite Bugs and questions related to IsoAnnotLite
Projects
None yet
Development

No branches or pull requests

2 participants