Skip to content

Input Format

Katie Siewert edited this page Jan 23, 2019 · 12 revisions

SNP File format

BetaScan takes in a white-space separated file with three columns. The first column contains the coordinate of each variant, and the second contains the frequency of the derived allele (note: this is opposite of the BALLET software), in number of haploid individuals, of the variant. However, in practice, for folded Beta only, it doesn't matter if the derived, ancestral, or already folded allele frequency is used in the second column, as BetaScan will fold the frequency anyway. The third column contains the sample size, in number of haploid individuals, that were used to calculate the frequency of that variant. The file should be sorted by position (the unix command sort -g will do this for you). If you are using the Beta2 statistic, substitutions should be coded as SNPs of frequency equal to the sample size. The scan should be run on each chromosome separately. An example of a sample file is below:

14  2 99  
15 99 99
25  1 100  
47  99  100
48  82  95
98 100 100
103 10  100
245 93  96

Generating BetaScan format

The BetaScan format can be generated from several file formats, including vcf, bam and plink, by the toolkit glactools. Thank you to Gabriel Renaud for this feature!

To use glactools with BetaScan you will first want to convert your vcf, bam or plink file to acf file format (see glactools documentation). In practice, you can pipe the output of the acf conversion step directly to glactools acf2betascan, but below I assume you've generated a acf file named yourfile.acf.gz. The second step is to use the program acf2betascan to convert to BetaScan format:

  1. If you are using an unfolded site frequency spectrum (due to using the --apo flag or usepopsrootanc program), then use either the --useanc or --useroot command. Below I've shown the command with --useanc.:
glactools acf2betascan --useanc yourfile.acf.gz > 1kg_betascanform.txt
  1. If you're using a folded site frequency spectrum (remember to use the -fold flag in BetaScan as well!):
glactools acf2betascan --fold yourfile.acf.gz > 1kg_betascanform_folded.txt

Mutation Rate Map

BetaScan allows the user to specify local mutation rates to use in conjunction with the -std flag. To specify that you want BetaScan to use this map, specify its path using the -thetaMap command. This format has three columns: beginning coordinate of window (inclusive), end coordinate of window (exclusive), and mutation rate. Mutation rate should be in terms of 2*p*Ne*u, where u is the per-basepair mutation rate, Ne is the effective population size and p is the ploidy. Note that all SNP positions in your SNP input file must be in a window. An example is below.

1 100 .01
100 190 .005
190 250 .001
Clone this wiki locally