-
Notifications
You must be signed in to change notification settings - Fork 133
BioAlcidae
##Motivation
Bioinformatics file javascript-based reformatter ( java rhino engine http://en.wikipedia.org/wiki/Rhino_%28JavaScript_engine%29 ). Something like awk for VCF, BAM, SAM, FASTQ, FASTA etc...
##Compilation
See also Compilation.
$ make bioalcidae
##Synopsis
$ java -jar dist/bioalcidae.jar [options] (stdin|file1 file2 ... fileN|file.list)
##Options
Option | Description |
---|---|
-f (file) | javascript file |
-e (expression) | javascript expression |
-o (file) | output file. Default: stdout |
-F (format) | [VCF, SAM, BAM, FASTA, FASTQ] optional. Required when reading stdin |
-h | get help (this screen) and exit. |
-v | print version and exit. |
-L (level) | log level. One of java.util.logging.Level . Optional. |
##Variables
the program injects the following variables:
- out a java.io.PrintWriter ( https://docs.oracle.com/javase/7/docs/api/java/io/PrintWriter.html ) for output
- FILENAME a string, the name of the current input
- format a string, the format of the current input ("VCF"...)
###VCF
for VCF , the program injects the following variables:
- header a htsjdk.variant.vcf.VCFHeader https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/variant/vcf/VCFHeader.html
- iter a java.util.Iterator<htsjdk.variant.variantcontext.VariantContext> https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/variant/variantcontext/VariantContext.html
###Fasta
- iter a java.util.Iterator
public class Fasta
{
public String getSequence();
public String getName();
public void print();
public int getSize();
public char charAt(int i);
}
###BAM or SAM
- header a htsjdk.samtools.SAMFileHeader http://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/SAMFileHeader.html
- iter a htsjdk.samtools.SAMRecordIterator https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/SAMRecordIterator.html
###FASTQ
- iter a java.util.Iterator<htsjdk.samtools.fastq.FastqRecord> https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/fastq/FastqRecord.html
##Source Code
Main code is: https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/tools/bioalcidae/BioAlcidae.java
##Example
Reformating a VCF we want to reformat a VCF with header
CHROM POS REF ALT GENOTYPE_SAMPLE1 GENOTYPE_SAMPLE2 ... GENOTYPE_SAMPLEN
we use the following javascript file:
var samples = header.sampleNamesInOrder;
out.print("CHROM\tPOS\tREF\tALT");
for(var i=0;i< samples.size();++i)
{
out.print("\t"+samples.get(i));
}
out.println();
while(iter.hasNext())
{
var ctx = iter.next();
if(ctx.alternateAlleles.size()!=1) continue;
out.print(ctx.chr +"\t"+ctx.start+"\t"+ctx.reference.displayString+"\t"+ctx.alternateAlleles.get(0).displayString);
for(var i=0;i< samples.size();++i)
{
var g = ctx.getGenotype(samples.get(i));
out.print("\t");
if(g.isHomRef())
{
out.print("0");
}
else if(g.isHomVar())
{
out.print("2");
}
else if(g.isHet())
{
out.print("1");
}
else
{
out.print("-9");
}
}
out.println();
}
$ curl -s "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" | \
gunzip -c | java -jar ./dist/bioalcidae.jar -f jeter.js -F vcf | head -n 5 | cut -f 1-10
CHROM POS REF ALT HG00096 HG00097 HG00099 HG00100 HG00101 HG00102
22 16050075 A G 0 0 0 0 0 0
22 16050115 G A 0 0 0 0 0 0
22 16050213 C T 0 0 0 0 0 0
22 16050319 C T 0 0 0 0 0 0
for 1000 genome data, print CHROM/POS/REF/ALT/AF(europe):
$ curl "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5a.20130502.sites.vcf.gz" | gunzip -c |\
java -jar dist/bioalcidae.jar -F VCF -e 'while(iter.hasNext()) {var ctx=iter.next(); if(!ctx.hasAttribute("EUR_AF") || ctx.alternateAlleles.size()!=1) continue; out.println(ctx.chr+"\t"+ctx.start+"\t"+ctx.reference.displayString+"\t"+ctx.alternateAlleles.get(0).displayString+"\t"+ctx.getAttribute("EUR_AF"));}'
1 10177 A AC 0.4056
1 10235 T TA 0
1 10352 T TA 0.4264
1 10505 A T 0
1 10506 C G 0
1 10511 G A 0
1 10539 C A 0.001
1 10542 C T 0
1 10579 C A 0
1 10616 CCGCCGTTGCAAAGGCGCGCCG C 0.994
(...)
- Issue Tracker: http://github.com/lindenb/jvarkit/issues`
- Source Code: http://github.com/lindenb/jvarkit
##See also
- BioAwk : https://github.com/lh3/bioawk
- https://www.biostars.org/p/152016/
- https://www.biostars.org/p/152720/
- https://www.biostars.org/p/152820/
##History
- 2015 : Creation
The project is licensed under the MIT license.