Skip to content

SB Concatenate sequences

H. Mendes-Soares edited this page Jun 3, 2016 · 1 revision

--concat_seqs, -cts

Description

Concatenate multiple sequences into a single record. If the sequence is output in GenBank (default) or EMBL format, the position of each input sequence will be annotated in the new record.

Argument

'clean'

Optional. Pass in the word 'clean' to run the clean_seq tool to remove all non-sequence characters (e.g., gaps) from the sequence before concatenation.

Examples

Input file: Drosophila.fa

>Dme-Panxδ3
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFCWITYTYTVAG
PGLEKHSYYQWVPFVLFFQGLMFYVPHWVWKMDGKIRMITGVDDRDRILKYFVNNTHNGY
SFYFFCELLNFINVIVNIFMVDKFLGGAFMSYGTDVLKFSNMDQDRFDPMIEIFPRLTKC
TFHKFGPSGSVQKHDTLCVLALNILNEKIYIFLWFWFIILATISGVAVLYSVVITRTIRK
EGDFLILHFLSQNLSTRSYSDMLQ
>Dme-Panxδ2
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPIDCIVEIPLGVM
DTYCWIYSTFTVPEGRDVQPGSEKYHKYYQWVCFVLFFQAILFYVPRYLWKSWEGGRLKM
LVDLSVNDKDRKIVDYFGNLNRHNFYAFFFVCEALNFVNVIGQIYFVDFFLDGEFSTYGS
DVLKFTELEPDERIDPMARVFPKVTKCTFHKYGPSGSVQTHDGLCVLPLNIVNEKIYVFL
WFWFIILSIMSISLIYRIAVAPKLRHLLLRARSRAESEVEVAIGDWFLLYQLGKNIDPLI
YKEVISDLEMG
>Dme-Panxδ4
MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPIQCFGDKDMDA
FCWIYGAYLQCAVSKVVENYITYYQWVVLVLLLESFVFYMPAFLWKIWEGGRLKHLCDFK
RTHRVLVNYFETHFRYFVYVFCEILNLSISILNFLLLDVFFGGFWGRYRNALYNQWIAVF
PKCAKCEYKGGPSGSSNIYDYLCLLPLNILNEKIFAFLWIWFILAMLISLKFLYRLAVLY
PMRLQLLRPKKHLQVALNCSFGDWFVLMRVGNNISPELFRKLLEEL
>Dme-Panxδ1
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPISCIVGVPHVV
NTFCWIHSTFTMPDRREVHPGVDFKYYTYYQWVCFVLFFQAMACYTPKFLWNKFEGGLMR
MIVGLNITRKRDALLDYLIKHVKRHKLYAYWACEFLCCINIIVQMYLMNRFFDGEFLSYG
TNIMKLSDVPQEQRVDPMVYVFPRVTKCTFHKYGPSGSLQKHDSLCILPLNIVNEKTYVF
IWFWFWILLVLLGLVFRCIIFPKFRPRLLNASNRIPMECRLDIGDWWLIYMLGRNLDPVI
YKDVMSEFQVP

Usage example 1

$: sb Drosophila.fa -cts

Output

LOCUS       concatination           1172 aa                     UNK 01-JAN-1980
DEFINITION  
ACCESSION   concatination
VERSION     concatination
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     Dme-Panxδ3      1..264
     Dme-Panxδ2      265..575
     Dme-Panxδ4      576..861
     Dme-Panxδ1      862..1172
ORIGIN
        1 gfikidnmvf rchyritail ftcciivtan nligdpisci ipmhvintfc witytytvag
       61 pglekhsyyq wvpfvlffqg lmfyvphwvw kmdgkirmit gvddrdrilk yfvnnthngy
      121 sfyffcelln finvivnifm vdkflggafm sygtdvlkfs nmdqdrfdpm ieifprltkc
      181 tfhkfgpsgs vqkhdtlcvl alnilnekiy iflwfwfiil atisgvavly svvitrtirk
      241 egdflilhfl sqnlstrsys dmlqmdvfgs vkgllkidqv dnnvfrmhyk atviiliafs
      301 llvtsrqyig dpidciveip lgvmdtycwi ystftvpegr dvqpgsekyh kyyqwvcfvl
      361 ffqailfyvp rylwkswegg rlkmlvdlsv ndkdrkivdy fgnlnrhnfy afffvcealn
      421 fvnvigqiyf vdffldgefs tygsdvlkft elepderidp marvfpkvtk ctfhkygpsg
      481 svqthdglcv lplnivneki yvflwfwfii lsimsisliy riavapklrh lllrarsrae
      541 sevevaigdw fllyqlgkni dpliykevis dlemgmaavk plskylqfkv hiydaiftlh
      601 skvtvallla ctfllsskqy fgdpiqcfgd kdmdafcwiy gaylqcavsk vvenyityyq
      661 wvvlvllles fvfympaflw kiweggrlkh lcdfkrthrv lvnyfethfr yfvyvfceil
      721 nlsisilnfl lldvffggfw gryrnalynq wiavfpkcak ceykggpsgs sniydylcll
      781 plnilnekif aflwiwfila mlislkflyr lavlypmrlq llrpkkhlqv alncsfgdwf
      841 vlmrvgnnis pelfrkllee lykllgslks ylkwqiqtdn avfrlhnsft tvllltcsli
      901 itatqyvgqp iscivgvphv vntfcwihst ftmpdrrevh pgvdfkyyty yqwvcfvlff
      961 qamacytpkf lwnkfegglm rmivglnitr krdalldyli khvkrhklya ywaceflcci
     1021 niivqmylmn rffdgeflsy gtnimklsdv pqeqrvdpmv yvfprvtkct fhkygpsgsl
     1081 qkhdslcilp lnivnektyv fiwfwfwill vllglvfrci ifpkfrprll nasnripmec
     1141 rldigdwwli ymlgrnldpv iykdvmsefq vp
//

Usage example 2

$: sb Drosophila.fa -cts -o fasta

Output

>concatination
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFCWITYTYTVAG
PGLEKHSYYQWVPFVLFFQGLMFYVPHWVWKMDGKIRMITGVDDRDRILKYFVNNTHNGY
SFYFFCELLNFINVIVNIFMVDKFLGGAFMSYGTDVLKFSNMDQDRFDPMIEIFPRLTKC
TFHKFGPSGSVQKHDTLCVLALNILNEKIYIFLWFWFIILATISGVAVLYSVVITRTIRK
EGDFLILHFLSQNLSTRSYSDMLQMDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFS
LLVTSRQYIGDPIDCIVEIPLGVMDTYCWIYSTFTVPEGRDVQPGSEKYHKYYQWVCFVL
FFQAILFYVPRYLWKSWEGGRLKMLVDLSVNDKDRKIVDYFGNLNRHNFYAFFFVCEALN
FVNVIGQIYFVDFFLDGEFSTYGSDVLKFTELEPDERIDPMARVFPKVTKCTFHKYGPSG
SVQTHDGLCVLPLNIVNEKIYVFLWFWFIILSIMSISLIYRIAVAPKLRHLLLRARSRAE
SEVEVAIGDWFLLYQLGKNIDPLIYKEVISDLEMGMAAVKPLSKYLQFKVHIYDAIFTLH
SKVTVALLLACTFLLSSKQYFGDPIQCFGDKDMDAFCWIYGAYLQCAVSKVVENYITYYQ
WVVLVLLLESFVFYMPAFLWKIWEGGRLKHLCDFKRTHRVLVNYFETHFRYFVYVFCEIL
NLSISILNFLLLDVFFGGFWGRYRNALYNQWIAVFPKCAKCEYKGGPSGSSNIYDYLCLL
PLNILNEKIFAFLWIWFILAMLISLKFLYRLAVLYPMRLQLLRPKKHLQVALNCSFGDWF
VLMRVGNNISPELFRKLLEELYKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLI
ITATQYVGQPISCIVGVPHVVNTFCWIHSTFTMPDRREVHPGVDFKYYTYYQWVCFVLFF
QAMACYTPKFLWNKFEGGLMRMIVGLNITRKRDALLDYLIKHVKRHKLYAYWACEFLCCI
NIIVQMYLMNRFFDGEFLSYGTNIMKLSDVPQEQRVDPMVYVFPRVTKCTFHKYGPSGSL
QKHDSLCILPLNIVNEKTYVFIWFWFWILLVLLGLVFRCIIFPKFRPRLLNASNRIPMEC
RLDIGDWWLIYMLGRNLDPVIYKDVMSEFQVP

Input file: Drosophila.nex

#NEXUS
begin data;
	dimensions ntax=4 nchar=656;
	format datatype=protein missing=? gap=-;
matrix
'Mle-Panxα10A' -mrlsekstshdckacitrshnedcarrwgitiddgwdqlnrsfmfgllvvmgttvtvr-qytgsviscdgfkkfg--stfaedycwtqgqytvlegydqpnqnipcplpaafapypgifpeelshclvgarkagqsedlingtrlkcpdpdqllsptrishlwyqwvpfyfwlaaaaffmpyllyknfgigdikplvrflhn--pvesdqelkkmtdkaatwlfykfdlymseqsllasltnkhglglsvvfvkilyaavsfgcflltadmfsigdfktygsewinklklednlateekdklfpkmvacevkrwgasgieeeqgmcvlapnvinqylflilwfclvfvmfcnivsifaslikllftygsyrrllsta-flrddsa---ikhmyfnvgssgrlilhvlanntaprvfedilltlapkliqrkl-rak------------------------------------------------------------------------------------------------------------------------------------------------------dy------------------------------------------------------------------d
'Mle-Panxα7A'  -mgveilfpi----------inratapiksvniddlssqlnrtfmfylsltfaititirqqlggayiacdgfsrdeeyerfaeewcwssgiytikeayemsnrvsp---------ypgiipenlpaci--------emelisggrvecpeekdvkpftriyqswypfvmfyywltalmfflpyqlykvfgfedvkavvamlqn--pvedgfekkelikrgsvwlylkstmtlsnpsvyssfivkhslafyaltvkvmylgntllmywlthkmfkfgsfaeygllwdtrnp-lnnvqslvqeklfpkvaacevkrfgasgleedqgmcmlalnvlnqylflifwfcllfvtivntisllltllniispcfmlqqfllas-sldrspavgvisklyldcgsslrfimtifawnvdpklfgeilvqlnsllakdespraevlkrrskkvkvpsprkpkllfheeikkklikrterkddnltnftnmskiskkfeglkkrnllqtksiinvsvpkkmseleveedfiltpteesgiqnnpdtkyaqedvldseyvvveqsvpetmteqesveesvpeiskaeqeggssdhidveetppasdvdrevnspivqheyqvqidlvsddgsahrlssdealfplripivklngdvlslrseslq
'Mle-Panxα3'   mlllgslgti------------knlsifkdlslddwldqmnrtfmflllcfmgtivavs-qytgkniscdgftkfg--edfsqdycwtqglytikeaydlpesqip---------ypgiipenvpacr--------ehalknggkivcppedqvkpltrarhlwyqwipfyfwviapvfylpymfvkrmgldrmkpllkimsdyyhcttetpseeiivkcadwvynsivdrlsegsswtswrnrhglglavlvskfmylggsvlvmmmttlmfqvgdfktygiewlrqfpnpenystsvkhklfpkmvaceikrwgttgleeengmcvlapnviyqyiflimwfalaitictnfgniffylfkltatrytynklvatghfshkhpg---wkfmyyrigtsgrvllnivaqntnpiifgaimekltpsvikhlr-ighvpge-------------------------------------------------------------------------------------------------------yltdpa----------------------------------------------------------------------------------------------------------
'Mle-Panxα6'   -mlleilanf------------kgatpfkeivlddkwdqinrcymfllcvifgtvvtfr-qytggiiacdgltkfs--aafaedycwtqglytikeaydivdnslp---------ypgllpedappcl--------srrlvsggriecppadlyleptrvhhtwyqwipfyfwvisiafigpyivykqlgvnelkpilamlhn--pvdgddvtkdqiskvsrwlaiklnifiqekstyakitqshrmfilifltkifylgvslatmyftdtmfesgryltygsewfasldkqsnytsfvrdrlfpkmvaceikrwgpsgmeeeqgmcvlapnvmnqylflifwfalvftifsntfsiffsvsthcfidggyqrfiqsc-flkensk---lkfiyfncgttgrtylhliaknvnprifeqliiklsadlveekn-kqhlkgskd-----------------------------------------------------------------------------------------------------ilv-------------------------------------------------------------------------------------------------------------
;
end;

Usage example 3

$: sb Drosophila.nex -cts clean

Output

LOCUS       concatination           1870 aa                     UNK 01-JAN-1980
DEFINITION
ACCESSION   concatination
VERSION     concatination
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     Mle-Panxα10A    1..429
     Mle-Panxα7A     430..1053
     Mle-Panxα3      1054..1464
     Mle-Panxα6      1465..1870
ORIGIN
        1 mrlsekstsh dckacitrsh nedcarrwgi tiddgwdqln rsfmfgllvv mgttvtvrqy
       61 tgsviscdgf kkfgstfaed ycwtqgqytv legydqpnqn ipcplpaafa pypgifpeel
      121 shclvgarka gqsedlingt rlkcpdpdql lsptrishlw yqwvpfyfwl aaaaffmpyl
      181 lyknfgigdi kplvrflhnp vesdqelkkm tdkaatwlfy kfdlymseqs llasltnkhg
      241 lglsvvfvki lyaavsfgcf lltadmfsig dfktygsewi nklklednla teekdklfpk
      301 mvacevkrwg asgieeeqgm cvlapnvinq ylflilwfcl vfvmfcnivs ifaslikllf
      361 tygsyrrlls taflrddsai khmyfnvgss grlilhvlan ntaprvfedi lltlapkliq
      421 rklrakdydm gveilfpiin ratapiksvn iddlssqlnr tfmfylsltf aititirqql
      481 ggayiacdgf srdeeyerfa eewcwssgiy tikeayemsn rvspypgiip enlpacieme
      541 lisggrvecp eekdvkpftr iyqswypfvm fyywltalmf flpyqlykvf gfedvkavva
      601 mlqnpvedgf ekkelikrgs vwlylkstmt lsnpsvyssf ivkhslafya ltvkvmylgn
      661 tllmywlthk mfkfgsfaey gllwdtrnpl nnvqslvqek lfpkvaacev krfgasglee
      721 dqgmcmlaln vlnqylflif wfcllfvtiv ntisllltll niispcfmlq qfllassldr
      781 spavgviskl yldcgsslrf imtifawnvd pklfgeilvq lnsllakdes praevlkrrs
      841 kkvkvpsprk pkllfheeik kklikrterk ddnltnftnm skiskkfegl kkrnllqtks
      901 iinvsvpkkm seleveedfi ltpteesgiq nnpdtkyaqe dvldseyvvv eqsvpetmte
      961 qesveesvpe iskaeqeggs sdhidveetp pasdvdrevn spivqheyqv qidlvsddgs
     1021 ahrlssdeal fplripivkl ngdvlslrse slqmlllgsl gtiknlsifk dlslddwldq
     1081 mnrtfmflll cfmgtivavs qytgkniscd gftkfgedfs qdycwtqgly tikeaydlpe
     1141 sqipypgiip envpacreha lknggkivcp pedqvkpltr arhlwyqwip fyfwviapvf
     1201 ylpymfvkrm gldrmkpllk imsdyyhctt etpseeiivk cadwvynsiv drlsegsswt
     1261 swrnrhglgl avlvskfmyl ggsvlvmmmt tlmfqvgdfk tygiewlrqf pnpenystsv
     1321 khklfpkmva ceikrwgttg leeengmcvl apnviyqyif limwfalait ictnfgniff
     1381 ylfkltatry tynklvatgh fshkhpgwkf myyrigtsgr vllnivaqnt npiifgaime
     1441 kltpsvikhl righvpgeyl tdpamlleil anfkgatpfk eivlddkwdq inrcymfllc
     1501 vifgtvvtfr qytggiiacd gltkfsaafa edycwtqgly tikeaydivd nslpypgllp
     1561 edappclsrr lvsggriecp padlyleptr vhhtwyqwip fyfwvisiaf igpyivykql
     1621 gvnelkpila mlhnpvdgdd vtkdqiskvs rwlaiklnif iqekstyaki tqshrmfili
     1681 fltkifylgv slatmyftdt mfesgrylty gsewfasldk qsnytsfvrd rlfpkmvace
     1741 ikrwgpsgme eeqgmcvlap nvmnqylfli fwfalvftif sntfsiffsv sthcfidggy
     1801 qrfiqscflk ensklkfiyf ncgttgrtyl hliaknvnpr ifeqliikls adlveeknkq
     1861 hlkgskdilv
//

Main Toolkit Pages





Further Reading

Clone this wiki locally