Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use seqtk to extract subsequences from FASTA files #53

Open
ctb opened this issue Feb 4, 2023 · 0 comments
Open

use seqtk to extract subsequences from FASTA files #53

ctb opened this issue Feb 4, 2023 · 0 comments

Comments

@ctb
Copy link
Member

ctb commented Feb 4, 2023

a quick tutorial!

hackmd: https://hackmd.io/VYrGL_i8SA6WIALlHs16tA?view

reproduced below:

Using seqtk subseq to subselect sequences

This is a quick mini-tutorial on using seqtk to extract a set of sequences from a gzipped FASTA file; it should also work with uncompressed files and FASTQ files (compressed or not).

Installation of seqtk below requires an installation of conda & mamba; I recommend using mambaforge.


On farm, in a datalab-XX account:

First, install seqtk in a conda environment named seq:

mamba create -y -n seq seqtk

Activate seq:

mamba activate seq

Make a new working directory:

mkdir -p ~/extract-from-fasta
cd ~/extract-from-fasta

Download a FASTA file of contigs:

curl -L https://osf.io/download/7bzrc -o contigs.fasta.gz

List names of sequences in FASAT file:

gunzip -c ^'>' contigs.fasta.gz

Make a text file containing the names of a few contigs from that file:

echo k117_180 > extract-list.txt
echo k117_181 >> extract-list.txt

:::warning
Here, the names to extract can be just the prefix of the sequence you want to extract - but the names have to match at the beginning? Not 100% sure.
:::

Extract using seqtk subseq:

seqtk subseq contigs.fasta.gz extract-list.txt > extracted-contigs.fasta

et voila!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant