-
Notifications
You must be signed in to change notification settings - Fork 133
BioAlcidae
Pierre Lindenbaum edited this page May 13, 2015
·
17 revisions
##Motivation
Bioinformatics file javascript-based reformatter ( java rhino engine http://en.wikipedia.org/wiki/Rhino_%28JavaScript_engine%29 ). Something like awk for VCF, BAM, SAM, FASTQ, FASTA etc...
the program injects the following variables:
- out a java.io.PrintWriter ( https://docs.oracle.com/javase/7/docs/api/java/io/PrintWriter.html ) for output
- FILENAME a string, the name of the current input
- format a string, the format of the current input ("VCF"...)
###VCF for VCF , the program injects the following variables:
- header a htsjdk.variant.vcf.VCFHeader https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/variant/vcf/VCFHeader.html
- iterator a java.util.Iterator<htsjdk.variant.variantcontext.VariantContext> https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/variant/variantcontext/VariantContext.html
##Compilation
See also Compilation.
$ make bioalcidae
##Synopsis
$ java -jar dist/bioalcidae.jar [options] (stdin|file1 file2 ... fileN|file.list)
##Options
Option | Description |
---|---|
-f (file) | javascript file |
-e (expression) | javascript expression |
-o (file) | output file. Default: stdout |
-F (format) | [VCF, SAM, BAM, FASTA, FASTQ] optional. Required when reading stdin |
-h | get help (this screen) and exit. |
-v | print version and exit. |
-L (level) | log level. One of java.util.logging.Level . Optional. |
##Source Code
Main code is: https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/tools/bioalcidae/BioAlcidae.java
##Example
Reformating a VCF we want to reformat a VCF with header
CHROM POS REF ALT GENOTYPE_SAMPLE1 GENOTYPE_SAMPLE2 ... GENOTYPE_SAMPLEN
we use the following javascript file:
var samples = header.sampleNamesInOrder;
out.print("CHROM\tPOS\tREF\tALT");
for(var i=0;i< samples.size();++i)
{
out.print("\t"+samples.get(i));
}
out.println();
while(iter.hasNext())
{
var ctx = iter.next();
if(ctx.alternateAlleles.size()!=1) continue;
out.print(ctx.chr +"\t"+ctx.start+"\t"+ctx.reference.displayString+"\t"+ctx.alternateAlleles.get(0).displayString);
for(var i=0;i< samples.size();++i)
{
var g = ctx.getGenotype(samples.get(i));
out.print("\t");
if(g.isHomRef())
{
out.print("0");
}
else if(g.isHomVar())
{
out.print("2");
}
else if(g.isHet())
{
out.print("1");
}
else
{
out.print("-9");
}
}
out.println();
}
$ curl -s "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" | \
gunzip -c | java -jar ./dist/bioalcidae.jar -f jeter.js -F vcf | head -n 5 | cut -f 1-10
CHROM POS REF ALT HG00096 HG00097 HG00099 HG00100 HG00101 HG00102
22 16050075 A G 0 0 0 0 0 0
22 16050115 G A 0 0 0 0 0 0
22 16050213 C T 0 0 0 0 0 0
22 16050319 C T 0 0 0 0 0 0
- Issue Tracker: http://github.com/lindenb/jvarkit/issues`
- Source Code: http://github.com/lindenb/jvarkit
##See also
- BioAwk : https://github.com/lh3/bioawk
##History
- 2015 : Creation
The project is licensed under the MIT license.