phaseLD

Alexandre Marand, 2017

A simple LD-based method to quickly phase pseudo-oneway test cross populations by identifying window-based haplotypes via Bayesian inference.

Citation

Alexandre P. Marand, Shelley H. Jansky, Hainan Zhao, Courtney P. Leisner, Xiaobiao Zhu, Zixian Zeng, Emily Crisovan, Linsey Newton, Andy J. Hamernik, Richard E. Veilleux, C. Robin Buell, Jiming Jiang. (2017). "Meiotic crossovers are associated with open chromatin and enriched with Stowaway transposons in potato." Genome Biology 18:203

Dependencies phaseLD.pl

phaseLD.pl requires users to have the following perl modules available to @INC:

All modules are available from CPAN, and can be accessed with cpan/cpanm.

Click here for a link to cpanm.

Dependencies extract_crossovers.pl

extract_crossovers.pl additionally requires the moduels:

All modules are available from CPAN, and can be accessed with cpan/cpanm.

Installation

After installing dependencies, simply add PATH=$PATH:/path/to/phaseLD/bin to your PATH in .bashrc or .zshrc.

Preparing Data

phaseLD expects pseudo-oneway test cross populations, theoretical segregation ratios of ~1:1 of homozygous to heterozygous genotypes. vcf2gen.pl, which is included in the directory bin, will convert homozygous reference genotypes (0/0) to 'a' calls, and heterozygous genotypes (0/1) to 'A'. All homozygous alternate allele calls (1/1) are converted to heterozygous genotypes, 'A'. Users can adjust the vcf2gen script to accomodate markers which segregate 1:1 for homozygous alternate alleles (1/1) and heterozygous genotypes (0/1).

Convert VCF file to gen file format.

perl vcf2gen.pl file.vcf > file.gen

Example of gen file format:

chr01	1000	A	-	-	a	A	a	a	A
chr01	1100	a	-	A	A	a	A	-	-
chr01	1200	a	a	A	A	a	-	A	a

Sample data can be found in the directory examples

Example of filter file format:

chr01_15879     
chr01_104598    
chr01_158589    
chr01_158620

IMPORTANT NOTE: only run phaseLD.pl on single chromosome at a time, otherwise the algorithm will be extremely slow as it saves a fair amount of information in memory. Running on more than one chromosome will also result in the haplotyping window to consider windows spanning multiple chromosomes as it assumes only a single chromosome at a time.

Output

phaseLD will output 3 files, with the suffix '.bayes', '.log', '.raw_hap'. The '.bayes' file contains the probabilities of both haplotypes for each individual in a given window. The '.log' file contains all the information from the LD calculations. The '.raw_hap' file contains the non-window haplotype, with the normalized probability of the correct phase assignment.

Usage

Name:

    phaseLD v.0.01, developed by Alexandre Marand, 2017


Usage:

    phaseLD.pl --in <file.gen> [OPTIONS]


Description:

    A complete list of parameter options. See 
    'https://github.com/plantformatics/phaseLD' for more information.


  Input/Output:
  
    --in|-i             Genotype file [String|REQUIRED]

    --out|-o            Prefix for output files [String|Default='phased']


  Performance:
  
    --win|-w            Set minimum window size for estimating Pairwise LD
                        (when --quick_mode or --fast_mode are False). If
                        --quick_mode or --fast_mode are enabled, this
                        distance becomes the search area for linked markers.
                        Windows are dynamically extended if no markers
                        within the specified window are in linkage with the
                        current iteration. Markers which are not linked with
                        neighboring markers are typically false positives.
                        See prefix.log to identify markers with low linkage.
                        [Int|Default=100]

    --step|-s           Set minimum step size for LD calculations in a
                        sliding window, defaults to 1. [Int|Default=1]

    --bwin|-n           Set minimum window size for bayesian haplotype calling.
                        [Int|Default=50]

    --bstep|-t          Set minimum step size for bayesian haplotype calling.
                        [Int|Default=5]

    --rpen|-r           Set minimum r2 value needed to retain SNPs. This
                        threshold helps to remove false positive SNP calls.
                        Defaults to 0.25. [Int|Default=0.25]


  Run Time Options:
  
    --threads|-p        Set number of threads. [Int|Default=1]

    --quick_mode|-f     Instead of calculating LD for every marker in the
                        window, calculate LD for window_size/10. The next
                        position is the first marker above the R2 threshold.
                        This makes the computation 10-20X faster for the
                        default window size. Time saved increases
                        exponentially as window sizes get larger.
                        [Flag]

    --fast_mode|-q      Rather than use pairwise LD calculations within a
                        window, use LD chaining instead. Starting at marker
                        X, the algorithm searches for the nearest marker, y,
                        with R2 above the threshold. The next iteration
                        starts at marker y and continues until all markers
                        are exhausted. [Flag]

    --filter|-f         Optionally use a file to mask markers. This file
                        should contain the chromosome and position of the
                        marker (example: chr01_1000) on a single line.
                        Useful to cleaning haplotype switch problems or
                        thinning data to specific markers. [String|Optional]


  Misc:
  
    --help|-h           Print a brief help message and exit.

Usage extract_crossovers.pl

extract_crossovers.pl foo.bayes foo.raw_hap > foo.crossoverintervals

Output header information

Column 1 = chromosome

Column 2 = start

Column 3 = end

Column 4 = haplotype transition

Column 5 = crossover probability

Column 6 = progeny index

Column 7 = crossover interval length (bp)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

phaseLD

Citation

Dependencies phaseLD.pl

Dependencies extract_crossovers.pl

Installation

Preparing Data

Output

Usage

Usage extract_crossovers.pl

Output header information

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
bin		bin
examples		examples
README.md		README.md

plantformatics/phaseLD

Folders and files

Latest commit

History

Repository files navigation

phaseLD

Citation

Dependencies phaseLD.pl

Dependencies extract_crossovers.pl

Installation

Preparing Data

Output

Usage

Usage extract_crossovers.pl

Output header information

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages