Java utilities for Bioinformatics
Since 2015-12-10, I'm slowly moving to a XML-based description of my tools and I'm now using java8 .
I do my best to change all those tools. The documentation in the wiki migh be out of date. If you think you have found an error leave a message at https://github.com/lindenb/jvarkit/issues .
Furthermore, I'm now using the "Apache Commons CLI library" for parsing the command line. For the arguments takings as input more than one parameter, you might have to
add a double dash '--' to separate with the input files.
Pierre Lindenbaum PhD
http://plindenbaum.blogspot.com
see Cite
##Tools
Tool | Description |
---|---|
SplitBam | Split a BAM by chromosome group. Creates EMPTY bams if no reads was found for a given group. |
SamJS | Filtering a SAM/BAM with javascript (rhino). |
VCFFilterJS | Filtering a VCF with javascript (rhino) |
SortVCFOnRef | Sort a VCF using the order of the chromosomes in a REFerence index. |
Illuminadir | Create a structured (**JSON** or **XML**) representation of a directory containing some Illumina FASTQs. |
BamStats04 | Coverage statistics for a BED file. It uses the Cigar string instead of the start/end to compute the coverage |
BamStats05 | same as BamStats04 but group by gene |
BamStats01 | Statistics about the reads in a BAM. |
VCFBed | Annotate a VCF with the content of a BED file indexed with tabix. |
VCFPolyX | Number of repeated REF bases around POS. |
VCFBigWig | Annotate a VCF with the data of a bigwig file. |
VCFTabixml | Annotate a value from a vcf+xml file.4th column of the BED indexed with TABIX is a XML string. |
GroupByGene | Group VCF data by gene/transcript. |
VCFPredictions | Basic variant prediction using UCSC knownGenes. |
FindCorruptedFiles | Reads filename from stdin and prints corrupted NGS files (VCF/BAM/FASTQ). |
VCF2XML | Transforms a VCF to XML. |
VCFAnnoBam | Annotate a VCF with the Coverage statistics of a BAM file + BED file of capture. It uses the Cigar string instead of the start/end to get the voverage |
VCFTrio | Check for mendelian incompatibilities in a VCF. |
SamGrep | Search reads in a BAM |
VCFFixIndels | Fix samtools INDELS for @SolenaLS |
NgsFilesSummary | Scan folders and generate a summary of the files (SAMPLE/BAM SAMPLE/VCF etc..). |
NoZeroVariationVCF | creates a VCF containing one fake variation if the input is empty. |
HowManyBamDict | for @abinouze : quickly find the number of distinct BAM Dictionaries from a set of BAM files. |
ExtendBed | Extends a BED file by 'X' bases. |
CmpBams | Compare two or more BAMs. |
IlluminaFastqStats | Statistics on Illumina Fastqs |
Bam2Raster | Save a BAM alignment as a PNG image. |
VcfRebase | Finds restriction sites overlapping variants in a VCF file |
FastqRevComp | Reverse complement a FASTQ file for mate-pair alignment |
PicardMetricsToXML | Convert picards metrics file to XML. |
Bam2Wig | Bam to Wiggle converter |
TViewWeb | CGI/Web based version of samtools tview |
VcfRegistryWeb | CGI/Web tool printing all variants at a given position for a collection VCF |
BlastMapAnnots | Maps uniprot/genbank annotations on a blast result. See http://www.biostars.org/p/76056 |
VcfViewGui | Simple java-Swing-based VCF viewer. |
BamViewGui | Simple java-Swing-based BAM viewer. |
Biostar81455 | Defining precisely the genomic context based on a position http://www.biostars.org/p/81455/ |
MapUniProtFeatures | map Uniprot features on reference genome. |
Biostar86363 | Set genotype of specific sample/genotype comb to unknown in multisample vcf file. |
FixVCF | Fix a VCF HEADER when I forgot to declare a FILTER or an INFO field in the HEADER |
Biostar78400 | Add the read group info to the sam file on a per lane basis |
Biostar78285 | Extract regions of genome that have 0 coverage See http://www.biostars.org/p/78285/ |
Biostar77288 | Low resolution sequence alignment visualization http://www.biostars.org/p/77288/ |
Biostar77828 | Divide the human genome among X cores, taking into account gaps See http://www.biostars.org/p/77828/ |
Biostar76892 | Fix strand of two paired reads close but on the same strand http://www.biostars.org/p/76892/ |
VCFCompareGT | VCF : compare genotypes of two or more callers for the same samples. |
SAM4WebLogo | Creates an Input file for BAM + WebLogo. |
SAM2Tsv | Tabular view of each base of the reads vs the reference. |
Biostar84786 | Table transposition |
VCF2SQL | Generate the SQL code to insert a VCF into a database |
VCFStripAnnotations | Removes one or more field from the INFO column from a VCF. |
VCFGeneOntology | Finds and filters the GO terms for VCF annotated with SNPEFF or VEP |
Biostar86480 | Genomic restriction finder See http://www.biostars.org/p/86480/ |
BamToFastq | Shrink your FASTQ.bz2 files by 40+% using this one weird tip by ordering them by alignment to reference |
PadEmptyFastq | Pad empty fastq sequence/qual with N/# |
SamFixCigar | Replace 'M'(match) in SAM cigar by 'X' or '=' |
FixVcfFormat | Fix PL format in VCF. Problem is described in http://gatkforums.broadinstitute.org/discussion/3453 |
VcfToRdf | Convert a VCF to RDF. |
VcfShuffle | Shuffle a VCF. |
DownSampleVcf | Down sample a VCF. |
VcfHead | Print the first variants of a VCF. |
VcfTail | Print the last variants of a VCF |
VcfCutSamples | Select/Exclude some samples from a VCF |
VcfStats | Generate some statistics from a VCF |
VcfSampleRename | Rename Samples in a VCF. |
VcffilterSequenceOntology | Filter a VCF on Seqence Ontology (SO). |
Biostar59647 | position of mismatches per read from a sam/bam file (XML) See http://www.biostars.org/p/59647/ |
VcfRenameChromosomes | Rename chromosomes in a VCF (eg. convert hg19/ucsc to grch37/ensembl) |
BamRenameChromosomes | Rename chromosomes in a BAM (eg. convert hg19/ucsc to grch37/ensembl) |
BedRenameChromosomes | Rename chromosomes in a BED (eg. convert hg19/ucsc to grch37/ensembl) |
BlastnToSnp | Map variations from a BLASTN-XML file. |
Blast2Sam | Convert a BLASTN-XML input to SAM |
VcfMapUniprot | Map uniprot features on VCF annotated with VEP or SNPEff. |
VcfCompare | Compare two VCF files. |
VcfBiomart | Annotate a VCF with the data from Biomart. |
VcfLiftOver | LiftOver a VCF file. |
BedLiftOver | LiftOver a BED file. |
VcfConcat | Concatenate VCF files. |
MergeSplittedBlast | Merge Blast hit from a splitted database |
FindMyVirus | Virus+host cell : split BAM into categories. |
Biostar90204 | linux split equivalent for BAM file . |
VcfJaspar | Finds JASPAR profiles in VCF |
GenomicJaspar | Finds JASPAR profiles in Fasta |
VcfTreePack | Create a TreeMap from one or more VCF |
BamTreePack | Create a TreeMap from one or more Bam. |
FastqRecordTreePack | Create a TreeMap from one or more Fastq files. |
WorldMapGenome | Map bed file to Genome + geographic data. |
AddLinearIndexToBed | Use a Sequence dictionary to create a linear index for a BED file. Can be used as a X-Axis for a chart. |
VCFComm | Compare mulitple VCF files, ouput a new VCF file. |
VcfIn | Prints variants that are contained/not contained into another VCF |
Biostar92368 | Binary interactions depth See also http://www.biostars.org/p/92368 |
VCFCombineTwoSnvs | TODO |
FastqGrep | Finds reads in fastq files |
VcfCadd | Annotate a VCF with Combined Annotation Dependent Depletion (CADD) data. |
SortVCFOnInfo | sort a VCF using a field in the INFO column |
SamChangeReference | TODO |
SamExtractClip | TODO |
GCAndDepth | Extracts GC% and depth for multiple bam using a sliding window. |
MsaToVcf | Getting a VCF file from a CLUSTAW or FASTA alignment |
CompareBamAndBuild | Compare two BAM files mapped on two different builds. Requires a liftover chain file. |
KnownGenesToBed | Convert UCSC KnownGene to BED. |
Biostar95652 | Drawing a schematic genomic context tree. See also http://www.biostars.org/p/95652/ |
SamToPsl | Convert SAM/BAM to PSL or BED12 . |
BWAMemNOp | merge the SA:Z:* attributes of a read mapped with bwa-mem and prints a read containing a cigar string with 'N' (Skipped region from the REF). |
FastqEntropy | Compute the Entropy of a Fastq file (distribution of the length(gzipped(sequence))) |
NgsFilesScanner | Build a persistent database of NGS file. Dump as XML. |
SigFrame | GUI displaying CGH data |
Biostar103303 | Calculate Percent Spliced In (PSI) |
VCFComparePredictions | Compare the variant predictions of VCFs |
BackLocate | Map a position in a protein back to the genomic coordinates. |
FindAVariation | Search for variations in a set of VCF files. |
AlleleFrequencyCalculator | VCF: Alelle Frequency Calculator |
BuildWikipediaOntology | Build a simple RDFS/XML ontology from Wikipedia Categories. |
AlmostSortedVcf | Sort an 'almost' sorted VCF using an in-memory buffer. |
Biostar105754 | bigwig: peak distance from specific genomic BED region |
VcfRegulomeDB | Annotate a VCF with the RegulomeDB data (http://regulome.stanford.edu/) |
Biostar106668 | unmark duplicates (deprecated) |
BatchIGVPictures | GUI: Batch pictures with IGV |
PubmedDump | Dump pubmed data as XML. |
BamIndexReadNames | Build a dictionary of read names to be searched with BamQueryReadNames. |
BamQueryReadNames | Query a Bam file indexed with BamIndexReadNames. |
FastqShuffle | Shuffle Fastq files. |
FastqSplitInterleaved | Split interleaved Fastq files |
PubmedFilterJS | Filters pubmed XML using javascript. |
ReferenceToVCF | Creates a VCF containing all possible substitutions in a Reference Genome.. |
VcfEnsemblReg | Annotate a VCF with the UCSC genome hub tracks for Ensembl Regulation. |
FastqJS | Filters a FASTQ file using javascript. |
Bam2SVG | Convert a BAM to SVG |
LiftOverToSVG | Convert UCSC LiftOver chain files to animated SVG |
VCFMerge | Combines VCF files. |
FixVcfMissingGenotypes | Use BAM to fill missing genotypes in merged VCFs |
NcbiTaxonomyToXml | Dump NCBI taxonomy tree as a hierarchical XML document |
BamCmpCoverage | Creates the figure of a comparative view of the depths sample vs sample |
FindAllCoveragesAtPosition | Find depth at specific position in a list of BAM files |
VcfMultiToOne | Convert VCF with multiple samples to a VCF with one SAMPLE |
Evs2Xml | Download data from Exome Variant Server as XML. |
VcfRemoveGenotypeIfInVcf | Reset Genotypes in VCF if they've been found in another VCF indexed with tabix |
Biostar130456 | Generate one VCF file for each sample from a multi-samples VCF |
UniprotFilterJS | Filter Uniprot XML with a javascript expression. |
SkipXmlElements | Filter XML elements with a javascript expression. |
MiniCaller | Simple and Stupid Variant Caller designed for @AdrienLeger2 |
VcfCompareCallersOneSample | For my colleague Julien. Compare VCF allers with VCF with one sample. |
SamRetrieveSeqAndQual | Is there a tool to add seq and qual to BAM? for @sjackman |
VcfEnsemblVepRest | Annotate a VCF with Ensembl REST API. |
VcfCompareCallers | Compare two VCFs and print common/exclusive information for each sample/genotype |
BamStats02 | Generate and explore statistics about the reads in a BAM (Sample/File/Flags/chroms/MAPQ) |
BamTile | Bam tiling Path. |
XContaminations | for @AdrienLeger2 : test for cross contamination between samples in same flowcell/runlane. |
VCFJoinVcfJS | Join two VCF files using javascript. |
Biostar139647 | Convert Clustal/Fasta alignment to SAM/BAM |
BioAlcidae | Reformat bioinformatics files using javascript/rhino (~ awk) |
VCFBedSetFilter | Set FILTER for VCF having intersection with BED |
VCFReplaceTag | Replace the key in INFO/FORMAT/FILTER |
VcfIndexTabix | sort, Compress (bgz) a VCF and create tabix index on the fly. |
VcfPeekVcf | Peek INFO Tag and ID from another VCF |
VcfGetVariantByIndex | Access a (plain or tabix-indexed) VCF file by the i-th index. |
VcfMultiToOneAllele | VCF: "one variant with N ALT alleles" to "N variants with one ALT" |
VcfMultiToOneInfo | VCF: "one variant having INFO with N values" to "N variants having INFO with one value" |
BedIndexTabix | Index and sort a BED on the fly with Tabix |
VcfToHilbert | Plot a Hilbert Curve from a VCF file. |
Biostar145820 | Shuffl Bam/Subsample BAM to fixed number of alignments |
PcrClipReads | Soft clip BAM files based on PCR target regions https://www.biostars.org/p/147136/ |
ExtendReferenceWithReads | Extending ends of REF sequence with the help of reads in BAM https://www.biostars.org/p/148089/ |
PcrSliceReads | Mark PCR reads to their PCR amplicon https://www.biostars.org/p/149687/" |
SamJmx | Monitor/interrupt/break a BAM/SAM stream with java JMX |
VcfJmx | Monitor/interrupt/break a VcfJmx stream with java JMX |
Gtf2Xml | convert gff to XML in order to be processed with XSLT |
SortSamRefName | Sort a SAM/BAM on REF/contig and then on read/query name |
Biostar154220 | Cap BAM to a given coverage. see https://www.biostars.org/p/154220 |
VcfToBam | create a BAM from a VCF. |
Biostar165777 | Split a XML file (e.g: blast) |
BlastFilterJS | Filters a XML Blast Output with a javascript expression |
Biostar170742 | SAM to AXT converter |
Biostar172515 | Convert Bam Index to XML |
Biostar173114 | make a bam file smaller by removing unwanted information see also https://www.biostars.org/p/173114 |
SamSlop | extends the SAM record in 5' and 3' |
Biostar175929 | Construct a combination set of fasta sequences from a vcf |
VcfCalledWithAnotherMethod | Add flags in VCF to tell whether Genotype/variant was called in another VCF |
Biostar178713 | split bed file into several bed files where each region is separated of any other by N bases |
VcfRemoveGenotypeJs | Reset Genotypes in VCF using a javascript expression |