Skip to content
Steve Bond edited this page Dec 12, 2016 · 2 revisions

--extract_regions, -er

Description

Pull out sub-alignments. If using a richly annotated format, like GenBank, features are deleted or adjusted appropriately.

Arguments

Positions ( str )

AlignBuddy uses a custom syntax to specify what regions should be extracted from each alignment, and multiple regions can either be passed in as separate arguments or combined into a single comma-separated string.

Single positions: This is the simplest syntax, consisting of a comma-separated list of each column you want extracted.

e.g., "1,2,4,45,79,305"

Ranges: Use two numbers separated by a colon to designate a range of columns, similar to python list notation. If the left side of the range is left blank, the range starts at the first column, and if the right side is left blank, the range extends to the final column. Negative numbers represent the number of columns from the end of the sequence.

e.g., "5:200" "400:-1" ":245"

Every Nth residue: Use a forward slash to indicate ordered, but non-contiguous, columns. For example, every 10th column. The left side of the slash can also accept the colon notation to specify a sub-range.

e.g., "1/10" "1:10/100"

Example

Input file: Panxs.phyr

 3 158
Mle-Panx9  -MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGSVISCDGFKKFGSTFAEDYCWTQGLYTVLEGYDQPSQNIPYPGLLPDEAPPCTPVRLKDGTRLKCPDPDQLLSPTRISHLWYQWVPFYFWLAAAAFFMPYLLYKNFGM
Mle-Panx8  MVLEVLALFPRLAPFKVITLDDVWDQWNRSFMFIMTVLFGSIVTIRSYTGSVIECDGFLKVPVEFAKDYCWTQGIYTLREGYDYHSSLLPYPGVFPEDAPGCLDKVLDNGGRVICPMDKKYRKYQRVYHSWYQFTAFYFWTASCAFFLPYMMFKFFGM
Mle-Panx6  MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTGGIIACDGLTKFSAAFAEDYCWTQGLYTIKEAYDIVDNSLPYPGLLPEDAPPCLSRRLVSGGRIECPPADLYLEPTRVHHTWYQWIPFYFWVISIAFIGPYIVYKQLGV

 5 165
Ael_PanxA  -MVVIRELKDILSMKIKTRHDGFCDQFNRMIMTKILIIMSVIVGFNYFYDEVSCMVFKKSDLQKEFISSSCWISGFYIFEEMKTRL-DKSSYYGIPYTINHDGIRKD-GTLCATRDR-LGLVEGCAPMTKVYYLQYQWMPFYIGSLSTFYYMPYIVFKMVNRDLM
Ael_PanxB  -MVVIRGLKDILSIKMKTRHDSICDQFNRLFMTRVLLIMSVIMGFDYYSDKVSCMVLGESHLGKDFIHAACWISGFYIYEEMKTRL-DKSSYYGIPYTIDNDGIEYD-GSLCPTRDK-NGKIPGCNPMTKVYYLQYQWMPFYVGSLAIFYYIPYIIFRMVNTDLV
Ael_PanxD  ----MEVLKDILSVQLKSRDDSYSDQFNRIFMCKLFLMSSIIMSVDYFSDNVNCMIPDNAQHSSSFFHSACWINGFYIFDEMRSRL-EKSGYYGIPQRVDFDGINRVTGELCITKNL-FGEAADCEPMTRIYYLHYQWMPVYMVSLGMFFYLPYIVFRFVNTDMI
Ael_PanxE  --MIGDAISNIISIKIKHRDDGVTDQYNRILMVKMIIMLSAIVGYNYYSDKVSCIVANEDDGIDGFVADTCWIQGFYVFKEMKKRL-GESAYLGLPRNMDYDGLDSN-GVLCSTTDRGSDSIQTCQKMKKVYYLQYQYFPFLLAGLAMLFYFPYIVFKVTNTDLV
Ael_PanxF  MGPFEDSIGKIFSFNIKRRADGITDQYNRILMVKICIIFTFVLGIDYFTNKTTCITPDMMRID---PTRTCWNEGFYIYPELENLPAKESSYYGIPKQIDNDGIDEN-GSPCTTKNI-FIKFLSCKPLKKQYYRQYQFMPFLIAVYGIIFYIPHIMFMVINTDII

Usage example 1

Extract a range of columns, using the colon (:) operator.

$: alb Panxs.phyr -er "11:100"

Output

 3 90
Mle-Panx9  GVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGSVISCDGFKKFGSTFAEDYCWTQGLYTVLEGYDQPSQNIPYPGLLPDEAP
Mle-Panx8  RLAPFKVITLDDVWDQWNRSFMFIMTVLFGSIVTIRSYTGSVIECDGFLKVPVEFAKDYCWTQGIYTLREGYDYHSSLLPYPGVFPEDAP
Mle-Panx6  GATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTGGIIACDGLTKFSAAFAEDYCWTQGLYTIKEAYDIVDNSLPYPGLLPEDAP

 5 90
Ael_PanxA  ILSMKIKTRHDGFCDQFNRMIMTKILIIMSVIVGFNYFYDEVSCMVFKKSDLQKEFISSSCWISGFYIFEEMKTRL-DKSSYYGIPYTIN
Ael_PanxB  ILSIKMKTRHDSICDQFNRLFMTRVLLIMSVIMGFDYYSDKVSCMVLGESHLGKDFIHAACWISGFYIYEEMKTRL-DKSSYYGIPYTID
Ael_PanxD  ILSVQLKSRDDSYSDQFNRIFMCKLFLMSSIIMSVDYFSDNVNCMIPDNAQHSSSFFHSACWINGFYIFDEMRSRL-EKSGYYGIPQRVD
Ael_PanxE  IISIKIKHRDDGVTDQYNRILMVKMIIMLSAIVGYNYYSDKVSCIVANEDDGIDGFVADTCWIQGFYVFKEMKKRL-GESAYLGLPRNMD
Ael_PanxF  IFSFNIKRRADGITDQYNRILMVKICIIFTFVLGIDYFTNKTTCITPDMMRID---PTRTCWNEGFYIYPELENLPAKESSYYGIPKQID

Usage example 2

Leave the left side of the range empty to begin extracting from the start of the alignment.

$: alb Panxs.phyr -er ":100"

Output

 3 100
Mle-Panx9  -MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGSVISCDGFKKFGSTFAEDYCWTQGLYTVLEGYDQPSQNIPYPGLLPDEAP
Mle-Panx8  MVLEVLALFPRLAPFKVITLDDVWDQWNRSFMFIMTVLFGSIVTIRSYTGSVIECDGFLKVPVEFAKDYCWTQGIYTLREGYDYHSSLLPYPGVFPEDAP
Mle-Panx6  MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTGGIIACDGLTKFSAAFAEDYCWTQGLYTIKEAYDIVDNSLPYPGLLPEDAP

 5 100
Ael_PanxA  -MVVIRELKDILSMKIKTRHDGFCDQFNRMIMTKILIIMSVIVGFNYFYDEVSCMVFKKSDLQKEFISSSCWISGFYIFEEMKTRL-DKSSYYGIPYTIN
Ael_PanxB  -MVVIRGLKDILSIKMKTRHDSICDQFNRLFMTRVLLIMSVIMGFDYYSDKVSCMVLGESHLGKDFIHAACWISGFYIYEEMKTRL-DKSSYYGIPYTID
Ael_PanxD  ----MEVLKDILSVQLKSRDDSYSDQFNRIFMCKLFLMSSIIMSVDYFSDNVNCMIPDNAQHSSSFFHSACWINGFYIFDEMRSRL-EKSGYYGIPQRVD
Ael_PanxE  --MIGDAISNIISIKIKHRDDGVTDQYNRILMVKMIIMLSAIVGYNYYSDKVSCIVANEDDGIDGFVADTCWIQGFYVFKEMKKRL-GESAYLGLPRNMD
Ael_PanxF  MGPFEDSIGKIFSFNIKRRADGITDQYNRILMVKICIIFTFVLGIDYFTNKTTCITPDMMRID---PTRTCWNEGFYIYPELENLPAKESSYYGIPKQID

Usage example 3

Leave the right side of the range empty to extract until the end of the alignment.

$: alb Panxs.phyr -er "100:"

Output

 3 59
Mle-Panx9  PPCTPVRLKDGTRLKCPDPDQLLSPTRISHLWYQWVPFYFWLAAAAFFMPYLLYKNFGM
Mle-Panx8  PGCLDKVLDNGGRVICPMDKKYRKYQRVYHSWYQFTAFYFWTASCAFFLPYMMFKFFGM
Mle-Panx6  PPCLSRRLVSGGRIECPPADLYLEPTRVHHTWYQWIPFYFWVISIAFIGPYIVYKQLGV

 5 66
Ael_PanxA  NHDGIRKD-GTLCATRDR-LGLVEGCAPMTKVYYLQYQWMPFYIGSLSTFYYMPYIVFKMVNRDLM
Ael_PanxB  DNDGIEYD-GSLCPTRDK-NGKIPGCNPMTKVYYLQYQWMPFYVGSLAIFYYIPYIIFRMVNTDLV
Ael_PanxD  DFDGINRVTGELCITKNL-FGEAADCEPMTRIYYLHYQWMPVYMVSLGMFFYLPYIVFRFVNTDMI
Ael_PanxE  DYDGLDSN-GVLCSTTDRGSDSIQTCQKMKKVYYLQYQYFPFLLAGLAMLFYFPYIVFKVTNTDLV
Ael_PanxF  DNDGIDEN-GSPCTTKNI-FIKFLSCKPLKKQYYRQYQFMPFLIAVYGIIFYIPHIMFMVINTDII

Usage example 4

Use negative numbers to specify distance from the rear of the alignment.

$: alb Panxs.phyr -er "100:-100"

Output

 3 42
Mle-Panx9  KKFGSTFAEDYCWTQGLYTVLEGYDQPSQNIPYPGLLPDEAP
Mle-Panx8  LKVPVEFAKDYCWTQGIYTLREGYDYHSSLLPYPGVFPEDAP
Mle-Panx6  TKFSAAFAEDYCWTQGLYTIKEAYDIVDNSLPYPGLLPEDAP

 5 35
Ael_PanxA  FISSSCWISGFYIFEEMKTRL-DKSSYYGIPYTIN
Ael_PanxB  FIHAACWISGFYIYEEMKTRL-DKSSYYGIPYTID
Ael_PanxD  FFHSACWINGFYIFDEMRSRL-EKSGYYGIPQRVD
Ael_PanxE  FVADTCWIQGFYVFKEMKKRL-GESAYLGLPRNMD
Ael_PanxF  -PTRTCWNEGFYIYPELENLPAKESSYYGIPKQID

Usage example 5

Pull out a group of specific columns from both alignments.

$: alb Panxs.phyr -er "32,34,35,37,38,42,43" "135,141,151"

Output

 3 10
Mle-Panx9  MVLVVTVVLL
Mle-Panx8  MIMVLIVTTM
Mle-Panx6  MLLVIVVIVI

 5 10
Ael_PanxA  MKIIIIVQFY
Ael_PanxB  MRVLIIMQFY
Ael_PanxD  MKLLMIMHVY
Ael_PanxE  MKMIMIVQFY
Ael_PanxF  MKIIIVLQFY

Usage example 6

Extract every tenth column using the forward-slash (/) operator (starting at column #1).

$: alb Panxs.phyr -er "1/10"

Output

 3 16
Mle-Panx9  -GDFTSFWGYPTLWLL
Mle-Panx8  MRDFSSVWGYGGYWTM
Mle-Panx6  MGDYTGFWAYPGYWVI

 5 17
Ael_PanxA  -IDIVEDCESHLLVFYN
Ael_PanxB  -IDFVKHCESNLKVFYN
Ael_PanxD  -IDFINQCEGFLEIVYN
Ael_PanxE  -IDLAKDCEAYLSVFYN
Ael_PanxF  MIDLFKRCESNPKQFYN

Usage example 7

Extract the first three columns of every ten by mixing the colon (:) and forward-slash (/) operators.

$: alb Panxs.phyr -er "1:3/10"

Output

 3 48
Mle-Panx9  -MLGVTDDGFMFTTVSVIFGSWTQGYDYPGPCTTRLLLSWYQLAALLY
Mle-Panx8  MVLRLADDVFMFSIVSVIVPVWTQGYDYPGGCLGRVYRKWYQTASMMF
Mle-Panx6  MLLGATDDKYMFTVVGIIFSAWTQAYDYPGPCLGRIYLEWYQVISIVY

 5 51
Ael_PanxA  -MVILSDGFIMTVIVEVSDLQCWIEMKSYYHDGLCALVEVYYFYIYMPNRD
Ael_PanxB  -MVILSDSIFMTVIMKVSHLGCWIEMKSYYNDGLCPKIPVYYFYVYIPNTD
Ael_PanxD  ---ILSDSYFMCIIMNVNQHSCWIEMRGYYFDGLCIEAAIYYVYMYLPNTD
Ael_PanxE  --MIISDGVLMVAIVKVSDGICWIEMKAYLYDGLCSSIQVYYFLLYFPNTD
Ael_PanxF  MGPIFSDGILMVFVLKTTRIDCWNELESYYNDGPCTKFLQYYFLIYIPNTD

Usage example 8

Wacky example to illustrate how flexible the syntax is. NOTE! If you use a minus sign (-), make sure there is a space between your quotation mark and the minus. Otherwise python thinks you're including a new flag.

$: alb Panxs.phyr -er " -5:8/10,45,124" "60:-100,5:42,78,-5" "1/50"

Output

 3 83
Mle-Panx9  -ILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVRQYSDGFKKAEDYTVSQNPDEPRLKPDPPRISPFYFFMLKFGM
Mle-Panx8  MVLALFPRLAPFKVITLDDVWDQWNRSFMFIMTVLFGSIIRSYSDGFLKAKDYTLSSLPEDGVLDPMDYRVYAFYFFLMKFGM
Mle-Panx6  MILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVFRQYGDGLTKAEDYTIDNSPEDPRLVPPAPRVHPFYFIGIKLGV

 5 87
Ael_PanxA  -IRELKDILSMKIKTRHDGFCDQFNRMIMTKILIIMSVIFNYFEVFKSDLQKEFISFYIL-DPYTHKD-DR-GAPMYQWLSTYVFKN
Ael_PanxB  -IRGLKDILSIKMKTRHDSICDQFNRLFMTRVLLIMSVIFDYYKVLGSHLGKDFIHFYIL-DPYTNYD-DK-GNPMYQWLAIYIFRN
Ael_PanxD  -MEVLKDILSVQLKSRDDSYSDQFNRIFMCKLFLMSSIIVDYFNIPDAQHSSSFFHFYIL-EPQRFRVTNL-DEPMYQWLGMYVFRN
Ael_PanxE  -GDAISNIISIKIKHRDDGVTDQYNRILMVKMIIMLSAIYNYYKVANDDGIDGFVAFYVL-GPRNYSN-DRGTQKMYQYLAMYVFKN
Ael_PanxF  MEDSIGKIFSFNIKRRADGITDQYNRILMVKICIIFTFVIDYFKTPDMRID---PTFYIPAKPKQNEN-NI-SKPLYQFYGIYMFMN

Main Toolkit Pages





Further Reading

Clone this wiki locally