-
Notifications
You must be signed in to change notification settings - Fork 23
SB Group by regex
Group sequences together based on shared characteristics in record IDs using regular expressions. The groups are written to files in the current working directory or some other pre-existing directory.
One or more regular expressions can be used to specify how to group IDs. If there are multiple matches in the ID, only the first match is used, and any records that do not contain a match will be sent to a separate file called 'Unknown'.
The pattern "^.*$" can be used to separate every record into its own file.
Optional. By default, all new files will be written to the current working directory. If you wish to send the output elsewhere, provide a path to an existing directory (new directories will not be created for you).
>Dme~Panxδ1
YKLLGSLKSYLKWQIQTDNAVFRLHNSFTTVLLLTCSLIITATQYVGQPI
>Dme~Panxδ2
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPID
>Dme~Panxδ3
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFC
>Dme~Panxδ4
MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPI
>Mle-Panxα1
MYWIFEICQEIKRAQSCRKFAIDGPFDWTNRIIMPTLMVICCFLQTFTFM
>Mle-Panxα5
MIYWVWAVFKRMAPFKVVTLDDRWDQMNRSFMMPLTMSFAYLIDYGIIAG
>Mle-Panxα6
MLLEILANFKGATPFKEIVLDDKWDQINRCYMFLLCVIFGTVVTFRQYTG
>Mle-Panxα9
MLDILSKFKGVTPFKGITIDDGWDQLNRSFMFVLLVVMGTTVTVRQYTGS
Simple regular expression matching the characters "P", "a", "n", "x", followed by one more character (dot operator '.')
$: sb C-terms.fa -gbr "Panx."
New file: /path/to/cwd/Panxδ.fa
New file: /path/to/cwd/Panxα.fa
Regular expression that does not match all IDs
$: sb C-terms.fa -gbr "Panx.[1-3]
New file: /path/to/cwd/Unknown.fa
New file: /path/to/cwd/Panxδ1.fa
New file: /path/to/cwd/Panxδ2.fa
New file: /path/to/cwd/Panxδ3.fa
New file: /path/to/cwd/Panxα1.fa
Multiple regular expressions
$: sb C-terms.fa -gbr "Dme.*δ" "Panx[αδ]"
New file: /path/to/cwd/Dme~Panxδ.fa
New file: /path/to/cwd/Panxα.fa
Use parentheses notation to extract parts of your match (results from multiple sets of parentheses are concatenated)
$: sb C-terms.fa -gbr "([MD]).*([αδ])"
New file: /path/to/cwd/Mα.fa
New file: /path/to/cwd/Dδ.fa
Write every single record out to its own file by passing in the empty string ""
$: sb C-terms.fa -gbr "^.*$"
New file: /path/to/cwd/Dme~Panxδ1.fa
New file: /path/to/cwd/Dme~Panxδ2.fa
New file: /path/to/cwd/Dme~Panxδ3.fa
New file: /path/to/cwd/Dme~Panxδ4.fa
New file: /path/to/cwd/Mle-Panxα1.fa
New file: /path/to/cwd/Mle-Panxα5.fa
New file: /path/to/cwd/Mle-Panxα6.fa
New file: /path/to/cwd/Mle-Panxα9.fa
Specify a pre-existing folder to change where the files are written to
$: sb C-terms.fa -gbr "~/foo/bar/" "Mle"
New file: /home/foo/bar/Unknown.fa
New file: /home/foo/bar/Mle.fa