Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
josuebarrera authored Aug 16, 2022
1 parent df6d683 commit 9cf7d59
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ Arguments and input files

- __-d__     The location of the NCBI taxonomy files, colloquially known as the “taxdump”. Make sure the taxdump directory contains the files “nodes.dmp” and “names.dmp”. This files are used by NCBItax2lin to create a “ncbi_lineages” file.

- __-r__     The location of the uncompressed “ncbi_lineages” file generated by NCBItax2lin. Once the user runs GenEra for the first time (or runs NCBItax2lin independently), this file can be used to save some time during step 2 of the pipeline. This file is a comma-delimited table with each row representing a specific lineage, and each column representing the taxonomic hierarchies to which that lineage belongs. This file will be automatically modified by genEra to rearrange the columns in hierarchical order.
- __-r__     The location of the uncompressed “ncbi_lineages” file generated by NCBItax2lin. Once the user runs GenEra for the first time (or runs NCBItax2lin independently), this file can be used to save some time during step 2 of the pipeline. This file is a comma-delimited table with each row representing a specific lineage, and each column representing the taxonomic hierarchies to which that lineage belongs. This file will be automatically modified by genEra to rearrange the columns in hierarchical order. IMPORTANT: Some users have experienced memory errors while using the __-d__ argument. If your analysis stops after this is written in the STDOUT "Preparings all lineages into a dataframe to be written to disk ...", you are likely experiencing the same issue, which is related to the dependency ncbitax2lin. We uploaded a compressed "ncbi_lineages" file that can be used with __-r__ after uncompressing it. This should give you the exact same result as if you ran the pipeline from scratch with __-d__.

- __-c__     Custom "ncbi_lineages" file that is already tailored for the query species. GenEra modifies the raw “ncbi_lineages” file so that the taxid of the query species appears in the first column, and the taxonomic hierarchies of the query species are rearranged from the species level all the way back to the last universal common ancestor (termed “cellular organisms” by the NCBI taxonomy). The file is also modified by GenEra to collapse the phylostrata (i.e., the taxonomic levels) that lack the necessary genomic data to be useful in the analysis. Once this file is generated, the user can re-use this file if they want to run GenEra on the same species with different parameters. By default, GenEra will search for the correct hierarchical order of the query organism’s phylostrata directly from the NCBI webpage. If the user machine is unable to access the NCBI webpage, GenEra will attempt to infer the correct order of the phylostrata directly from the “ncbi_lineages” file. However, there are some taxonomic hierarchies that the NCBI labels as “clades” (e.g., Embryophyta in the plant lineage), which GenEra cannot automatically assign to their correct taxonomic level when working offline. This option is also useful if the user wants to modify the taxonomic hierarchies that were originally assigned by the NCBI (e.g., an outdated taxonomy that does not reflect the phylogenetic relationships of the query species). This custom table can be easily generated by printing the desired columns of the “ncbi_lineages” file with awk:
```console
Expand Down

0 comments on commit 9cf7d59

Please sign in to comment.