Mapping short reads with Giraffe

This tutorial will explain how to use vg giraffe to map short reads to a pangenome graph.

The easiest way to start mapping to your own data is to have a single-copy reference FASTA, and a phased VCF(s) on several samples describing the variation you want to include. Your FASTA should not include alternative loci, because to use them properly you would also need to provide an alignment of the alternative loci to the main chromosome contigs, and vg cannot make sue of such an alignment alongside a VCF. Your VCF file(s) should be self-consistent, and not include contradictory or overlapping variants. They should also be restricted to VCF 4.2 features; the * allele of VCF 4.3 is not yet supported.

To turn your reference and VCFs into a graph, you can use vg autoindex:

This will build all the files needed for Giraffe to run: a GBWT haplotype index subsampled to a reasonable number of local haplotypes in the .giraffe.gbwt file, a GBWTGraph that provides node sequences in the .gg file, a minimizer index for seed finding in the .min file, and a minimum distance index in the .dist file.

If you have trouble building indexes, you may need more memory. You can control the amount of memory that the autoindexing process seeks to use at any given time with the --target-mem option, but be aware that this is a target, and the heuristics used to estimate the memory requirements for building various partial indexes may not work well on your data.

Once your indexes are built, you can then map reads using vg giraffe. For example, to map paired-end reads:

Start here

Build VG (or use it in Docker)

File Formats

VG Roadmap

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapping short reads with Giraffe

Clone this wiki locally