Gotranseq

Translate nucleic acid sequences to their corresponding peptide sequences. Like EMBOSS transeq, but written in go

Purpose

EMBOSS transeq is a great tool, but can be quite painfull for some use cases, because it silently truncate the sequence ID if it contains chars like ':', or rename the sequence ID if it contains chars like '|'

This tool is an attempt to solve this problem. It's also way faster than EMBOSS transeq because it can be parrallelized:

benchmark on ubuntu 16.04, machine with 2 CPU Intel(R) Core(TM)2 Duo CPU 3.00GHz with a 189MB fasta file:

#EMBOSS transeq
time transeq -sequence file.fna -outseq out.faa -frame 6  
41,82s user 0,76s system 85% cpu 49,696 total

#gotranseq
time ./gotranseq --sequence file.fna --outseq out.faa --frame 6 -n 2
7,75s user 0,98s system 159% cpu 5,472 total

Works on Linux, Mac and windows

Installation

Download the binary from the release page

or

Build from source:

First, make sure that go is installed on your machine (see install go for details ). Then clone the repo and build it:

git clone https://github.com/feliixx/gotranseq.git
cd gotranseq
go build

Usage

use gotranseq --help to print the help:

gotranseq version 0.3.0

Usage:
  gotranseq --sequence file.fna --outseq out.faa

required:
  -s, --sequence=<filename>    Nucleotide sequence(s) filename
  -o, --outseq=<filename>      Protein sequence filename

optional:
  -f, --frame=<code>           Frame to translate. Possible values:
                               [1, 2, 3, F, -1, -2, -3, R, 6]
                               F: forward three frames
                               R: reverse three frames
                               6: all 6 frames
                               (default: 1)
  -t, --table=<code>           NCBI code to use, see https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG1 for
                               details. Available codes:
                               0: Standard code
                               2: The Vertebrate Mitochondrial Code
                               3: The Yeast Mitochondrial Code
                               4: The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code
                               5: The Invertebrate Mitochondrial Code
                               6: The Ciliate, Dasycladacean and Hexamita Nuclear Code
                               9: The Echinoderm and Flatworm Mitochondrial Code
                               10: The Euplotid Nuclear Code
                               11: The Bacterial, Archaeal and Plant Plastid Code
                               12: The Alternative Yeast Nuclear Code
                               13: The Ascidian Mitochondrial Code
                               14: The Alternative Flatworm Mitochondrial Code
                               16: Chlorophycean Mitochondrial Code
                               21: Trematode Mitochondrial Code
                               22: Scenedesmus obliquus Mitochondrial Code
                               23: Thraustochytrium Mitochondrial Code
                               24: Pterobranchia Mitochondrial Code
                               25: Candidate Division SR1 and Gracilibacteria Code
                               26: Pachysolen tannophilus Nuclear Code
                               29: Mesodinium Nuclear
                               30: Peritrich Nuclear
                               (default: 0)
  -c, --clean                  Replace stop codon '*' by 'X'
  -a, --alternative            Define frame '-1' as using the set of codons starting with the last codon of the sequence
  -T, --trim                   Removes all 'X' and '*' characters from the right end of the translation. The trimming process starts at the
                               end and continues until the next character is not a 'X' or a '*'
  -n, --numcpu=<n>             Number of worker to use (default: number of CPU)

general:
  -h, --help                   Show this help message
  -v, --version                Print the tool version and exit

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github/workflows		.github/workflows
ncbicode		ncbicode
transeq		transeq
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gotranseq

Purpose

Installation

Usage

About

Releases 7

Packages

Languages

License

feliixx/gotranseq

Folders and files

Latest commit

History

Repository files navigation

Gotranseq

Purpose

Installation

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Languages

Packages