Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with grabbing taxonomy for blast hits using blast input file #6

Open
maottino opened this issue Jan 19, 2023 · 2 comments
Open

Comments

@maottino
Copy link

Hello! I'm extremely new to bioinformatics/python/github, so I apologize if this is a super easy fix! Thank you in advance for any help/input.

I am trying assign taxonomy to trnL (plant chloroplast) OTUs with blast results already run and output in the required format (outfmt '6 qseqid qlen sseqid pident length qstart qend sstart send evalue bitscore staxids'). I'll provide a file example of what my blast output looks like below in case that is the issue, but it's a tab delimited file with 10 top hits per OTU.

The code I use when running is:
python3 taxonomy_assignment_BLAST.py TLotus.fa ./ncbi_taxonomy/expanded_ncbi_taxonomy.tsv --blast_database IGNORE --blast_file TLotus_taxhits.txt

where TLotus.fa is my file with my OTU sequences (not using since I already have a blast file)
expanded_ncbi_taxonomy.tsv is taxonomy file, built as instructed (preview of what file looks like attached)
TLotus_taxhits.txt is my blast_output with custom formatting

The command runs without error, but the returned taxonomy is the default for when there is no match in the taxonomy file. For example, this is what every OTU looks like this for every OTU:

#BLAST LINE : Otu4 94 NC_047481.1 100 94 1 94 52771 52864 1.52E-39 174 94 354624
#BLAST LINE : Otu4 94 MN308055.1 100 94 1 94 52771 52864 1.52E-39 174 94 354624
#BLAST LINE : Otu4 94 MK105463.1 100 94 1 94 52769 52862 1.52E-39 174 94 3512

ASSIGNING TAXONOMY FOR Otu4 total hits passing initial filters = 3
NC_047481.1 100 --> CAPTURED after percent sway filter
MN308055.1 100 --> CAPTURED after percent sway filter
MK105463.1 100 --> CAPTURED after percent sway filter

Providing consensus taxonomy up to level 14 : tmp6

X100_1 superkingdom;subkingdom;sub_subkingdom;kingdom;tmp1;tmp2;phylum;class;family;genus;species;tmp3;tmp4;tmp5;tmp6
X100_2 superkingdom;subkingdom;sub_subkingdom;kingdom;tmp1;tmp2;phylum;class;family;genus;species;tmp3;tmp4;tmp5;tmp6
X100_3 superkingdom;subkingdom;sub_subkingdom;kingdom;tmp1;tmp2;phylum;class;family;genus;species;tmp3;tmp4;tmp5;tmp6

Taxonomy Assignment for Otu4 = superkingdom:subkingdom:sub_subkingdom:kingdom:tmp1:tmp2:phylum:class:family:genus:species:tmp3:tmp4:tmp5:tmp6

It looks like it's reading my blast file correctly, as it's pulling the accession number and percent ID correctly. When I search the taxonomy file for the taxids in the blast file, they are present with normal taxonomy information.

Is this some sort of basic formatting error? I've tried looking over the python code but have little to no knowledge of python coding and cannot find the issue myself!

ncbi_expanded_taxonomy_EXAMPLE.txt

TLotus_taxhits_EXAMPLE.txt

@kStarr24
Copy link

Hi @Joseph7e, I am the exact same issue. I tried both giving the script blast results in the required format and allowing the script to perform the blastn command. Both ways I encountered the generic output with no taxonomy information (exactly as shown above). Not sure what's going on here but it seems it must be something related to the expanded_ncbi_taxonomy.tsv file? I created the expanded_ncbi_taxonomy.tsv following the instructions in the README file.

I would be very grateful for any advice on how to fix this.

@kStarr24
Copy link

Hi @maottino, I've been digging into this, and after lots of trouble shooting I have the program working well! There are a couple things to check that might fix your issue.

As of today, the readme file for this program states that if you bring in your own blast file it should have the following format '6 qseqid qlen sseqid pident length qstart qend sstart send evalue bitscore'. However, the taxonomy_assignment_BLAST.py uses the last column of the blast table as the taxonomy id to look up in the taxonomy database. Ensure the the blast output's final column is staxids. This is the blast output format I used: "6 qseqid qlen sseqid pident length qstart qend sstart send evalue staxids". However, your post indicates that you do have staxids as the final column. I'm wondering if there was a change in the readme file between now and when you were attempting to use it?

Here is what is more likely your problem. If you look at the readme file in the usage section you can find that there is a flag called --ncbi_nt. When you include it it gets set to 'True'. You have to include this flag in the function call if you are using the ncbi database because it sends the function into an if statement that assigns the correct indexing for the NCBI taxonomy levels.

You can check to see if this is your problem by looking in the log_file.txt output. Near the beginning there is a section to check to make sure the taxonomy indexing is correct.
If it is incorrect you will see something like this:
#Taxonomy Categories
superkingdom:subkingdom:sub_subkingdom:kingdom:tmp1:tmp2:phylum:class:family:genus:species:tmp3:tmp4>#The below levels should match the target, i.e. species_level = species
Species_level = 14 = tmp6
Family_level = 10 = species
Phylum_level = 7 = class

And if it is correct you will see this:
#Taxonomy Categories
superkingdom:kingdom:phylum:subphylum:superclass:class:subclass:superorder:order:superfamily:family:>#The below levels should match the target, i.e. species_level = species
Species_level = 13 = species
Family_level = 10 = family
Phylum_level = 2 = phylum

Here is the function call that worked for me:

python3 Assign-Taxonomy-with-BLAST-master/taxonomy_assignment_BLAST.py ASVstest.fa /scratch/ncbi_taxonomy/expanded_ncbi_taxonomy.tsv --ncbi_nt --blast_file=blast_table.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants