-
Notifications
You must be signed in to change notification settings - Fork 194
Mapping short reads with Giraffe
This tutorial will explain how to use vg giraffe
to map short reads to a pangenome graph.
The easiest way to start mapping to your own data is to have a single-copy reference FASTA, and a phased VCF(s) on several samples describing the variation you want to include. Your FASTA should not include alternative loci, because to use them properly you would also need to provide an alignment of the alternative loci to the main chromosome contigs, and vg cannot make sue of such an alignment alongside a VCF. Your VCF file(s) should be self-consistent, and not include contradictory or overlapping variants. They should also be restricted to VCF 4.2 features; the *
allele of VCF 4.3 is not yet supported.
To turn your reference and VCFs into a graph, you can use vg autoindex
:
This will build all the files needed for Giraffe to run: a GBWT haplotype index subsampled to a reasonable number of local haplotypes in the .giraffe.gbwt
file, a GBWTGraph that provides node sequences in the .gg
file, a minimizer index for seed finding in the .min
file, and a minimum distance index in the .dist
file.
If you have trouble building indexes, you may need more memory. You can control the amount of memory that the autoindexing process seeks to use at any given time with the --target-mem
option, but be aware that this is a target, and the heuristics used to estimate the memory requirements for building various partial indexes may not work well on your data.
Once your indexes are built, you can then map reads using vg giraffe
. For example, to map paired-end reads: