Skip to content

SB Extract feature sequences

Steve Bond edited this page Dec 13, 2016 · 7 revisions

--extract_feature_sequences, -efs

Implemented in version 1.2

Description

Pull out the sequences of specific features from annotated sequences.

Argument

One or more search strings ( regex )

As many simple strings or regular expressions as you want. To avoid issues with special characters, make a habit of adding 'single quotes' around the search terms.

You can also include ranges that will grab all sequence between two features. Use a colon between the search patters to extract a range (see example 3)

Example

Input file: Mle-Panxα12.gb

LOCUS       Mle-Panxα12              403 aa                     UNA 02-JAN-2015
DEFINITION  cDNA - ML25997a.
ACCESSION   Mle-Panxα12
VERSION     Mle-Panxα12
KEYWORDS    .
SOURCE
  ORGANISM  . . .
            .
FEATURES             Location/Qualifiers
     CDS             1..403
                     /label="ML25997a"
                     /created_by="User"
     TMD1            28..48
     TMD2            131..151
     TMD3            215..235
     TMD4            299..329
ORIGIN
        1 mvidilsgfk gitpfkgitl ddgwdqinrs fmfvlcvlmg tvvtvrqyag giiscdgftk
       61 ysgsfsedyc wtqglytike aydlltmnvp ypgvipedmp tciereling grvscpdpet
      121 vkpptrvyhl wyqwvpfyfw laaaafffpy liykhfgvgd lkpliqmlhn pivdegdqnc
      181 maekasmwlf yklnvfmnen tifailtekh rlffivmlvk vlyliisila lyltdemfhi
      241 gsfvsygsew atslpegdne ttlvkdklfp kmvaceikrw gptgleeeqg mcvlapnvin
      301 qylflilwfa iifciacncl svlfaltklv fvlgsykrll asaflkdelh ykhmffnigt
      361 sgrvllqiva tnvsprvfes imanlatkli aerlkgngkg sv*
//

Usage example 1

$: sb Mle-Panxα12.gb -efs "TMD1"

Output

LOCUS       Mle-Panxα12              21 aa                      UNA 02-JAN-2015
DEFINITION  cDNA - ML25997a.
ACCESSION   Mle-Panxα12
VERSION     Mle-Panxα12
KEYWORDS    .
SOURCE
  ORGANISM  . . .
            .
FEATURES             Location/Qualifiers
     CDS             1..21
                     /created_by="User"
                     /label="ML25997a"
     TMD1            1..21
ORIGIN
        1 nrsfmfvlcv lmgtvvtvrq y
//

Usage example 2

$: sb Mle-Panxα12.gb -efs "TMD[1234]"

Output

LOCUS       Mle-Panxα12              94 aa                      UNK 01-JAN-1980
DEFINITION  cDNA - ML25997a.
ACCESSION   Mle-Panxα12
VERSION     Mle-Panxα12
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             1..94
                     /created_by="User"
                     /label="ML25997a"
     TMD1            1..21
     TMD2            22..42
     TMD3            43..63
     TMD4            64..94
ORIGIN
        1 nrsfmfvlcv lmgtvvtvrq ywyqwvpfyf wlaaaafffp ylivmlvkvl yliisilaly
       61 ltdinqylfl ilwfaiifci acnclsvlfa ltkl
//

Usage example 3

Grab all sequence between two patterns using the range syntax —> "pattern1:pattern2"

$: sb Mle-Panxα12.gb -efs "TMD2:TMD3"

Output

LOCUS       Mle-Panxα12              105 aa                     UNK 01-JAN-1980
DEFINITION  cDNA - ML25997a.
ACCESSION   Mle-Panxα12
VERSION     Mle-Panxα12
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             1..105
                     /created_by="User"
                     /label="ML25997a"
     TMD2            1..21
     TMD3            85..105
ORIGIN
        1 wyqwvpfyfw laaaafffpy liykhfgvgd lkpliqmlhn pivdegdqnc maekasmwlf
       61 yklnvfmnen tifailtekh rlffivmlvk vlyliisila lyltd
//

Main Toolkit Pages





Further Reading

Clone this wiki locally