-
Notifications
You must be signed in to change notification settings - Fork 23
SB Rename IDs
Modify record identifiers by searching for simple strings or more complex regular expressions. Each match will be replaced with your substitution string within the ID.
The query is a regular expression that searches inside every ID for any sub-string matches. Only the part that is matched will be replaced, not the entire ID. If you would like to match the entire ID, prefix the search with ^
and suffix with $
; these are the 'start of string' and 'end of string' identifiers, respectively (see example 3).
All matches to the query will be replaced with this exact string. If you want to retain part of the query in the substitution, you can do so by enclosing the proper part of the query in parentheses () and then using a back slash followed by a number (e.g., \1). Use '\1' for the first set of parentheses, '\2' for the second, etc (see example 4).
Optional. If a pattern is present in the IDs more than once but only some of those matches should be replaced, set a maximum number of replacements (see example 5). The default is '0', which corresponds to 'all'. To match/replace from right-to-left, instead of left-to-right, provide a negative number (see example 6).
Optional. Keep a copy of the original ID in the definition/description line. Include the exact string 'store' after the query and substitution strings to turn on this option (see example 7).
>Dme-Panxδ1
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>Dme-Panxδ11
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPID
>Dme-Panxδ3
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFC
>Dme-Panxδ4 Description line
MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPI
>Mle-Panxα1
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
>Mle-Panxα5
MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAG
>Mle-Panxα6
MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTG
>Mle-Panxα9
MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGS
Simple replacement
$: sb C-terms.fa -ri 'Mle' 'Mnemiopsis'
>Dme-Panxδ1
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>Dme-Panxδ11
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPID
>Dme-Panxδ3
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFC
>Dme-Panxδ4 Description line
MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPI
>Mnemiopsis-Panxα1
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
>Mnemiopsis-Panxα5
MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAG
>Mnemiopsis-Panxα6
MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTG
>Mnemiopsis-Panxα9
MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGS
Incorporate a regular expression
$: sb C-terms.fa -ri 'Panx[αδ]1' 'Panx?'
>Dme-Panx?
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>Dme-Panxδ11
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPID
>Dme-Panxδ3
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFC
>Dme-Panxδ4 Description line
MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPI
>Mle-Panx?
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
>Mle-Panxα5
MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAG
>Mle-Panxα6
MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTG
>Mle-Panxα9
MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGS
Match an ID exactly
$: sb C-terms.fa -ri '^Dme-Panxδ1$' 'Unknown_Panx'
>Unknown_Panx
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>Dme-Panxδ11
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPID
>Dme-Panxδ3
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFC
>Dme-Panxδ4 Description line
MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPI
>Mle-Panxα1
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
>Mle-Panxα5
MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAG
>Mle-Panxα6
MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTG
>Mle-Panxα9
MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGS
Keep part of the match in the replacement
$: sb C-terms.fa -ri '^(..)e-Panx([αδ][0-9]+)$' '\1-Inx\2'
>Dm-Inxδ1
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>Dm-Inxδ11
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPID
>Dm-Inxδ3
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFC
>Dm-Inxδ4 Description line
MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPI
>Ml-Inxα1
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
>Ml-Inxα5
MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAG
>Ml-Inxα6
MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTG
>Ml-Inxα9
MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGS
Limit the number of matches
$: sb C-terms.fa -ri '[a-z]' '?' 2
>D??-Panxδ1
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>D??-Panxδ11
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPID
>D??-Panxδ3
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFC
>D??-Panxδ4 Description line
MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPI
>M??-Panxα1
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
>M??-Panxα5
MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAG
>M??-Panxα6
MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTG
>M??-Panxα9
MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGS
Match from right-to-left
$: sb C-terms.fa -ri '[a-z]' '?' -2
>Dme-Pa??δ1
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>Dme-Pa??δ11
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPID
>Dme-Pa??δ3
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFC
>Dme-Pa??δ4 Description line
MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPI
>Mle-Pa??α1
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
>Mle-Pa??α5
MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAG
>Mle-Pa??α6
MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTG
>Mle-Pa??α9
MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGS
Include the original ID in the description line
$: sb C-terms.fa -ri '[a-z]' '?' -2 'store'
>Dme-Pa??δ1 Dme-Panxδ1
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>Dme-Pa??δ11 Dme-Panxδ11
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPID
>Dme-Pa??δ3 Dme-Panxδ3
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFC
>Dme-Pa??δ4 Dme-Panxδ4 Description line
MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPI
>Mle-Pa??α1 Mle-Panxα1
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
>Mle-Pa??α5 Mle-Panxα5
MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAG
>Mle-Pa??α6 Mle-Panxα6
MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTG
>Mle-Pa??α9 Mle-Panxα9
MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGS