This repository contains scripts that you can use at different stages to attempt to find more clock-like genes. Generally, you would use these for dating analyses with another package.
The results you get from these analyses are just suggestions and tools to help explore your dataset. You should be sure to examine the results carefully and not use them blindly.
You will minimally need python
to run the scripts. Other than python
, you will probably want two packages from phyx
(see installation below) in order to run all parts of SortaData.
The expected input is a set of genes that have been aligned and for which there are gene trees. Generally, we would expect these to be rooted as well. The input filetype would be fasta. If you need help converting these files, you can use phyx
or put a feature request and we can add scripts to convert files.
With the input of a directory of gene alignments and corresponding gene trees, the steps of these analyses include
- Get the tip-to-root variation with the
get_var_length.py
- Get the bipartition support with
get_bp_genetrees.py
- Combine the results from these two runs with
combine_results.py
- Sort and get the list of the good genes with
get_good_genes.py
. You can give the order that you prefer the sorting so--order 3,1,2
would mean that you want the bipartition sorted first (3=bipartition), then root-to-tip variance (1=root-to-tip variance), and then tree length (2=treelength).
These steps are separated out with the intention that you can examine the results of each step. In each case you can type python NAMEOFSCRIPT.py -h
to get what each argument is. The example below should also help.
To install and use SortaDate you will need python (it is probably already on your computer if you have a Mac or Linux machine) and you will need two programs from the phyx
set of programs (found https://github.com/FePhyFoFum/phyx). While phyx
has many different programs, we will only need 3 and to install these, you will not need any other software other than a compiler. Because you can install only part of the phyx
package, instructions for this will be placed here. If you would like to install all of the programs, consult the phyx
website.
The two bits that you will need from phyx
are pxlstr
, pxrmt
and pxbp
. To install this in Mac or Linux
- Go to https://github.com/FePhyFoFum/phyx then click the “Clone or download” and download the zip. Unzip the resulting file.
- Open the Terminal and change directory to the
src
directory (if you downloaded and unzipped the phyx-master.zip in your Downloads directory, on Mac you will probably run the commandcd Downloads/phyx-master/src
) - Run
./configure
. You will get a couple errors, probably, but you can ignore. This would be important only if you needed all of the programs inphyx
. - Run
make pxlstr
, thenmake pxrmt
and thenmake pxbp
. Then copy the programs into your PATH (so probablysudo cp pxlstr pxrmt pxbp /usr/local/bin/
. You will have to type your password.
This is trivial. Because the package is a python package, all you will need is to download the scripts here, and just run them directory from wherever you download them. You can see how to use the commands in the example below. You can grab the repository by clicking the “Clone or download” and download the zip. Unzip the resulting file and within you will find all the files.
There is an example dataset included in the repository. This is a partial dataset from the Jarvis et al. (2014) genomic bird paper. Once you have downloaded or cloned the repository, you can run these analyses:
- Get the root-to-tip variance with
python src/get_var_length.py examples/genes_trees/ --flend .tre.rr --outf examples/var --outg Struthio_camelus,Tinamou_guttatus
- Get the bipartition support with
python src/get_bp_genetrees.py examples/genes_trees/ examples/Chrono_Tent_Bird_study.new --flend .tre.rr --outf examples/bp
- Combine the results from these two runs with
python src/combine_results.py examples/var examples/bp --outf examples/comb
- Sort and get the list of the good genes with
python src/get_good_genes.py examples/comb --max 3 --order 3,1,2 --outf examples/gg
If you need trees to be rooted, you can perform this in many different ways. One way that you can root these trees is with the phyx
program pxrr
(found https://github.com/FePhyFoFum/phyx). Since Sorta date requires 3 other phyx packages, installing the 4th should be relatively trivial by running make pxrr
in the source directory of phyx when the other programs were made. To then move this to the path you can run sudo cp pxrr /usr/local/bin/
.
If you would like to conduct likelihood ratio tests for the clock for each of the gene trees, you may do this with paup
and you can find that here http://phylosolutions.com/paup-test/. There are scripts included in SortaDate
to conduct these analyses that will add the necessary paup block to the file.
The scripts and procedures discussed here are presented in Smith et al. 2018 So many genes, so little time: A practical approach to divergence-time estimation in the genomic era. Plos One.