-
Notifications
You must be signed in to change notification settings - Fork 133
BamToFastq
Pierre Lindenbaum edited this page Nov 20, 2013
·
11 revisions
##Motivation
implementation of of https://twitter.com/DNAntonie/status/402909852277932032 " +
Shrink your FASTQ.bz2 files by 40+% using this one weird tip -> order them by alignment to reference before compression
##Compilation See also Compilation.
$ ant bam2fastq
##Options
Name | Description |
---|---|
-v | print version and exit. |
-E (name) | restrict to that enzyme. Can be called multiple times. Optional |
-t (dir) | set temporary directory . Optional |
-F (fastq) | Save fastq_R1 to file (default: stdout) . Optional. |
-R (fastq) | Save fastq_R2 to file (default: interlaced with forward) . Optional |
-r | repair: insert missing read |
-N (int) | max records in memory. Optional |
##Example
$ bwa mem -M human_g1k_v37.fasta Sample1_L001_R1_001.fastq.gz Sample2_S5_L001_R2_001.fastq.gz |\
java -jar dist/bam2fastq.jar -F tmpR1.fastq.gz -R tmpR2.fastq.gz
before:
$ ls -lah Sample1_L001_R1_001.fastq.gz Sample2_S5_L001_R2_001.fastq.gz
-rw-r--r-- 1 lindenb lindenb 181M Jun 14 15:20 Sample1_L001_R1_001.fastq.gz
-rw-r--r-- 1 lindenb lindenb 190M Jun 14 15:20 Sample1_L001_R2_001.fastq.gz
after:
$ ls -lah tmpR1.fastq.gz tmpR2.fastq.gz
-rw-rw-r-- 1 lindenb lindenb 96M Nov 20 17:10 tmpR1.fastq.gz
-rw-rw-r-- 1 lindenb lindenb 106M Nov 20 17:10 tmpR2.fastq.gz
check the number of reads
$ gunzip -c Sample1_L001_R1_001.fastq.gz | wc -l
5824676
$ gunzip -c tmpR1.fastq.gz | wc -l
5824676
$ java -jar dist/bam2fastq.jar \
-F tmpR1.fastq.gz -R tmpR2.fastq.gz file.bam
(...)
-rw-r--r-- 1 lindenb lindenb 565M Nov 18 10:44 Sample_S1_L001_R1_001.fastq.gz
-rw-r--r-- 1 lindenb lindenb 649M Nov 18 10:45 Sample_S1_L001_R2_001.fastq.gz
-rw-rw-r-- 1 lindenb lindenb 470M Nov 20 16:17 tmpR1.fastq.gz.fastq.gz
-rw-rw-r-- 1 lindenb lindenb 554M Nov 20 16:17 tmpR2.fastq.gz.fastq.gz