Skip to content

Input Format

Katie Siewert edited this page Mar 8, 2019 · 12 revisions

SNP File format

BetaScan takes in a white-space separated file with three columns. The first column contains the coordinate of each variant, and the second contains the frequency of the derived allele (note: this is opposite of the BALLET software), in number of haploid individuals, of the variant. However, in practice, for folded Beta only, it doesn't matter if the derived, ancestral, or already folded allele frequency is used in the second column, as BetaScan will fold the frequency anyway. The third column contains the sample size, in number of haploid individuals, that were used to calculate the frequency of that variant. The file should be sorted by position (the unix command sort -g will do this for you). If you are using the Beta2 statistic, substitutions should be coded as SNPs of frequency equal to the sample size. The scan should be run on each chromosome separately. An example of a sample file is below:

14  2 99  
15 99 99
25  1 100  
47  99  100
48  82  95
98 100 100
103 10  100
245 93  96

Generating BetaScan format

The BetaScan format can be generated from several file formats, including vcf and plink, by the toolkit glactools. Thank you to Gabriel Renaud for this feature! This tutorial walks you through how to do this.

Mutation Rate Map

BetaScan allows the user to specify local mutation rates to use in conjunction with the -std flag. To specify that you want BetaScan to use this map, specify its path using the -thetaMap command. This format has three columns: beginning coordinate of window (inclusive), end coordinate of window (exclusive), and mutation rate. Mutation rate should be in terms of 2*p*Ne*u, where u is the per-basepair mutation rate, Ne is the effective population size and p is the ploidy. Note that all SNP positions in your SNP input file must be in a window. An example is below.

1 100 .01
100 190 .005
190 250 .001
Clone this wiki locally