Each database should include the following files:
Kraken2 database
- hash.k2d
- taxo.k2d
- opts.k2d
- seqid2taxid.map
Bracken databases (built for use with various read lengths N):
- databaseNmers.kmer_distrib
Additional files required for pipeline to run:
- inspect.out
- taxonomy/nodes.dmp
- taxonomy/names.dmp
- library/species_genome_size.txt
For use with post-processing scripts:
- host_prediction_to_genus.tsv
- species_name_to_vir_score.txt
Note: Phanta was developed with human gut metagenomes in mind. Phanta's default database was built based on human-gut viral and bacterial genomes. If you wish to apply Phanta on non human gut metagenomes you'll probably need to supply a custom database. In such cases please open new discussion and we can discuss the best way to help/collaborate on that.
The total tar.gz file should be about 20-25 GB (depends on the exact version).
- Default database (as described in our manuscript)- http://ab_phanta.os.scg.stanford.edu/Phanta_DBs/database_V1.tar.gz
- Prophage masked database (as described in our manuscript) http://ab_phanta.os.scg.stanford.edu/Phanta_DBs/masked_db_v1.tar.gz
- Default database that uses the GTDB taxonomy for bacteria and Archaea (instead of NCBI taxonomy). http://ab_phanta.os.scg.stanford.edu/Phanta_DBs/unmasked_db_v1_gtdb.tar.gz This taxonomy is equivalent to that provided by HumGut, with the exception that taxonomic IDs for GTDB nodes starts with 5,000,000 rather than 4,000,000. See Humgut documentation