Assigns short human readable descriptions (HRD) to query biological sequences using reference candidate descriptions. In this, prot-scriber
consumes sequence similarity search (Blast or Diamond or similar) results in tabular format. A customized lexical analysis is carried out on the descriptions of these Blast Hits and a resulting HRD is assigned to the query sequences.
prot-scriber
can also apply the same methodology to produce HRDs for sets of biological sequences, i.e. gene families.
(This section is for the lazy 😉 TL;DR)
prot-scriber
can be used to generate human readable descriptions (HRDs) for either query biological sequences (proteins) or for gene-families. We will walk you through both use cases below with ready to use example input files.
Depending on your operating system, download the ready to use executable from the table in section "Installation".
Independent of your use-case, sequence or gene-family annotation, you need to run a sequence similarity search of your query biological sequences against reference databases. We recommend searching UniProt Swissprot (uniprot_sprot.fasta.gz) and trEMBL (uniprot_trembl.fasta.gz). You will need to format (Blast makeblastdb
, Diamond diamond makedb
) these UniProt reference databases and search them with either Blast
blastp -db uniprot_sprot.fasta -query my_prots.fasta -num_threads 10 -out my_prots_vs_sprot.txt -outfmt \"6 delim=<TAB> qacc sacc stitle\"
(Note that the above <TAB>
actually needs to be a tab character. Typically you type that in with "Ctrl+v" followed by "Tab".)
or Diamond
diamond blastp -p 10 --quiet -d uniprot_sprot.fasta.dmnd -q my_prots.fasta -o my_prots_vs_sprot.txt -f 6 qseqid sseqid stitle
(See the manual section "2.2 Example Blast or Diamond commands" for details). For a quick test run you can assume to have carried out the searches and use the example output tables below (all files are taken from this repository's misc
directory):
To generate HRDs for twelve example biological sequences (proteins) use:
To generate HRDs for two gene-families, comprising four and three proteins, respectively, use:
Please read section "2.3 Gene Family preparation and analysis" of the manual for a recipy on how to cluster biological sequences into gene-families.
Note that on Windows the below would be "prot-scriber.exe
" instead of just "prot-scriber
".
To annotate biological sequences, e.g. proteins, with HRDs, use:
prot-scriber -s Twelve_Proteins_vs_Swissprot_blastp.txt -s Twelve_Proteins_vs_trembl_blastp.txt -o Twelve_Proteins_HRDs.txt
Find prot-scriber
's output in file Twelve_Proteins_HRDs.txt
.
To annotate gene-families with HRDs, use:
prot-scriber -f families.txt -s family_prots_vs_Swissprot.txt -s family_prots_vs_trEMBL.txt -o families_HRDs.txt
Find prot-scriber
's output in file families_HRDs.txt
.
You can choose to download a pre-built binary, ready to be executed, from the table below, if you want the latest stable version. Other versions can be downloaded from the Releases page. Have a look at the below table to know which is the version you need for your operating system and platform.
We strongly recommend to rename the downloaded release file to a simple prot-scriber
(or prot-scriber.exe
on Windows).
Note that on Mac OS and Unix / Linux operating systems you need to make the downloaded and renamed prot-scriber
file executable. In order to achieve this, open a terminal shell, navigate (cd
) to the directory where you saved prot-scriber
, and execute chmod a+x ./prot-scriber
.
Operating System | CPU-Architecture | Release-Name (click to download) | Comment |
---|---|---|---|
Windows 7 or higher | any | windows_prot-scriber.exe | to be used in a terminal (cmd or Power-Shell) |
any GNU-Linux | any Intel x86, 64 bits | x86_64-unknown-linux-gnu_prot-scriber | requires libm.so.6 (compiled with glibc version 2.27) and libc.so.6 (compiled with glibc 2.18) installed as is the case e.g. in Ubuntu >= 22.04 |
any GNU-Linux | any aarch, 64 bits | aarch64-unknown-linux-gnu_prot-scriber | e.g. for Raspberry Pi; requires libm.so.6 (compiled with glibc version 2.27) and libc.so.6 (compiled with glibc 2.18) installed as is the case e.g. in Ubuntu >= 22.04 |
Apple / Mac OS | any Mac Computer with Mac OS 10 | x86_64-apple-darwin_prot-scriber |
prot-scriber
is written in Rust. That makes it extremely performant and, once compiled for your operating system (OS), can be used on any machine with that particular OS environment. To compile it for your platform you first need to have Rust and cargo installed. Follow the official instructions.
Download the latest stable release of prot-scriber
here.
Unzip it, e.g. by double clicking it or by using the command line:
unzip latest-stable.zip
Change into the directory of the downloaded prot-scriber
code, e.g. in a Mac OS or Linux terminal cd prot-scriber-latest-stable
after having unpacked the latest stable release (see above).
Now, compile prot-scriber
with
cargo build --release
The above compilation command has generated an executable binary file in the current directory ./target/release/prot-scriber
. You can just go ahead and use prot-scriber
now that you have compiled it successfully (see Usage below).
If you are familiar with installing self compiled tools on a system wide level, this section will provide no news to you. It is convenient to make the compiled executable prot-scriber
program available from anywhere on your system. To achieve this, you need to copy it to any place you typically have your programs installed, or add its directory to your, our all users' $PATH
environment. In doing so, e.g. in case you are a system administrator, you make prot-scriber
available for all users of your infrastructure. Make sure you and your group have executable access rights to the file. You can adjust these access right with chmod ug+x ./target/release/prot-scriber
. You, and possibly other users of your system, are now ready to run prot-scriber
.
In case you are using conda to manage your pacakges, prot-scriber
is available on bioconda. Download via
conda install -c bioconda prot-scriber
or create and activate a new conda environment via
conda create -n prot-scriber -c bioconda -c conda-forge prot-scriber
conda activate prot-scriber
prot-scriber
is a command line tool and must be used in a terminal application. On Windows that will be cmd
or PowerShell, on Mac OS X or any Linux / Unix system that will be a standard terminal shell.
Please read the manual of the latest stable version (click to expand).
prot-scriber version 0.1.4
PLEASE USE '--help' FOR MORE DETAILS!
prot-scriber assigns human readable descriptions (HRD) to query biological sequences or sets of them
(a.k.a gene-families).
USAGE:
prot-scriber [OPTIONS] --output <output> --seq-sim-table <seq-sim-table>
OPTIONS:
-a, --annotate-non-family-queries
Use this option only in combination with --seq-families (-f), i.e. when prot-scriber is
used to generate human readable descriptions for gene families. If in that context this
flag is given, queries for which there are sequence similarity search (Blast) results
but that are NOT member of a sequence family will receive an annotation (human readable
description) in the output file, too. Default value of this setting is 'OFF' (false).
-b, --blacklist-regexs <blacklist-regexs>
A file with regular expressions (Rust syntax), one per line. Any match to any of these
regular expressions causes sequence similarity search result descriptions ('stitle' in
Blast terminology) to be discarded from the prot-scriber annotation process. If multiple
--seq-sim-table (-s) args are provided make sure the --blacklist-regexs (-b) args appear
in the correct order, e.g. the first -b arg will be used for the first -s arg, the
second -b will be used for the second -s and so on. Set to 'default' to use the hard
coded default. An example file can be downloaded here:
https://raw.githubusercontent.com/usadellab/prot-scriber/master/misc/blacklist_stitle_regexs.txt
- Note that this is an expert option.
-c, --capture-replace-pairs <capture-replace-pairs>
A file with pairs of lines. Within each pair the first line is a regular expressions
(fancy-regex syntax) defining one or more capture groups. The second line of a pair is
the string used to replace the match in the regular expression with. This means the
second line contains the capture groups (fancy-regex syntax). These pairs are used to
further filter the sequence similarity search result descriptions ('stitle' in Blast
terminology). In contrast to the --filter-regex (-l) matches are not deleted, but
replaced with the second line of the pair. Filtering is used to process descriptions
('stitle' in Blast terminology) and prepare the descriptions for the prot-scriber
annotation process. If multiple --seq-sim-table (-s) args are provided make sure the
--capture-replace-pairs (-c) args appear in the correct order, e.g. the first -c arg
will be used for the first -s arg, the second -c will be used for the second -s and so
on. Set to 'default' to use the hard coded default. An example file can be downloaded
here:
https://raw.githubusercontent.com/usadellab/prot-scriber/master/misc/capture_replace_pairs.txt
- Note that this is an expert option.
-d, --polish-capture-replace-pairs
The last step of the process generating human readable descriptions (HRDs) for the
queries (proteins or sequence families) is to 'polish' the selected HRDs. Polishing is
done by iterative application of regular expressions (fancy-regex) and replace
instructions (capture-replace-pairs). If you do not want to use the default polishing
capture replace pairs specify a file in which pairs of lines are given. Of each pair the
first line hold a regular expression (fancy-regex syntax) and the second the replacement
instructions providing access to capture groups. Set to 'none' or provide an empty file,
if you want to suppress polishing. If you want to have a template file for your custom
polishing capture-replace-pairs please refer to
https://raw.githubusercontent.com/usadellab/prot-scriber/master/misc/polish_capture_replace_pairs.txt
- Note that this an expert option.
-e, --header <header>
Header of the --seq-sim-table (-s) arg. Separated by space (' ') the names of the
columns in order of appearance in the respective table. Required and default columns are
'qacc sacc stitle'. Note that this option only understands Blast terminology, i.e. even
if you ran Diamond, please provide 'qacc' instead of 'qseqid' and 'sacc' instead of
'sseqid'. Luckily 'stitle' is 'stitle' in Diamond, too. You can have additional columns
that will be ignored, as long as the required columns appear in the correct order.
Consider this example: 'qacc sacc evalue bitscore stitle'. If multiple --seq-sim-table
(-s) args are provided make sure the --header (-e) args appear in the correct order,
e.g. the first -e arg will be used for the first -s arg, the second -e will be used for
the second -s and so on. Set to 'default' to use the hard coded default.
-f, --seq-families <seq-families>
A file in which families of biological sequences are stored, one family per line. Each
line must have format 'fam-name TAB gene1,gene2,gene3'. Make sure no gene appears in
more than one family.
-g, --seq-family-gene-ids-separator <seq-family-gene-ids-separator>
A regular expression (Rust syntax) used to split the list of gene-identifiers in the
argument --seq-families (-f) gene families file. Default is '(\s*,\s*|\s+)'.
-h, --help
Print help information
-i, --seq-family-id-genes-separator <seq-family-id-genes-separator>
A string used as separator in the argument --seq-families (-f) gene families file. This
string separates the gene-family-identifier (name) from the gene-identifier list that
family comprises. Default is '<TAB>' ("\t").
-l, --filter-regexs <filter-regexs>
A file with regular expressions (Rust syntax), one per line. Any match to any of these
regular expressions causes the matched sub-string to be deleted, i.e. filtered out.
Filtering is used to process descriptions ('stitle' in Blast terminology) and prepare
the descriptions for the prot-scriber annotation process. In case of UniProt sequence
similarity search results (Blast result tables), this removes the Blast Hit identifier
(`sacc`) from the description (`stitle`) and also removes the taxonomic information
starting with e.g. 'OS=' at the end of the `stitle` strings. If multiple --seq-sim-table
(-s) args are provided make sure the --filter-regexs (-l) args appear in the correct
order, e.g. the first -l arg will be used for the first -s arg, the second -l will be
used for the second -s and so on. Set to 'default' to use the hard coded default. An
example file can be downloaded here:
https://raw.githubusercontent.com/usadellab/prot-scriber/master/misc/filter_stitle_regexs.txt
- Note that this is an expert option.
-n, --n-threads <n-threads>
The maximum number of parallel threads to use. Default is the number of logical cores.
Required minimum is two (2). Note that at most one thread is used per input sequence
similarity search result (Blast table) file. After parsing these annotation may use up
to this number of threads to generate human readable descriptions.
-o, --output <output>
Filename in which the tabular output will be stored.
-p, --field-separator <field-separator>
Field-Separator of the --seq-sim-table (-s) arg. The default value is the '<TAB>'
character. Consider this example: '-p @'. If multiple --seq-sim-table (-s) args are
provided make sure the --field-separator (-p) args appear in the correct order, e.g. the
first -p arg will be used for the first -s arg, the second -p will be used for the
second -s and so on. You can provide '-p default' to use the hard coded default (TAB).
-q, --center-inverse-word-information-content-at-quantile <center-inverse-word-information-content-at-quantile>
The quantile (percentile) to be subtracted from calculated inverse word information
content to center these values. Consequently, this must be a value between zero and one
or literal 50, which is interpreted as mean instead of a quantile. Default is 5o,
implying centering at the mean. Note that this is an expert option.
-r, --description-split-regex <description-split-regex>
A regular expression in Rust syntax to be used to split descriptions (`stitle` in Blast
terminology) into words. Default is '([~_\-/|\;,':.\s]+)'. Note that this is an expert
option.
-s, --seq-sim-table <seq-sim-table>
File in which to find sequence similarity search results in tabular format (SSST). Use
e.g. Blast or Diamond to produce them. Required columns are: 'qacc sacc stitle' (Blast)
or 'qseqid sseqid stitle' (Diamond). (See section '2. prot-scriber input preparation'
for more details.) If the required columns, or more, appear in different order than
shown here you must use the --header (-e) argument. If any of the input SSSTs uses a
different field-separator than the '<TAB>' character, you must provide the
--field-separator (-p) argument. You can provide multiple SSSTs, simply by repeating the
-s argument, e.g. '-s queries_vs_swissprot_diamond_out.txt -s
queries_vs_trembl_diamond_out.txt'. Providing multiple --seq-sim-table (-s) arguments
might imply the order in which you give other arguments like --header (-e) and
--field-separator (-p). See there for more details.
-v, --verbose
Print informative messages about the annotation process.
-V, --version
Print version information
-w, --non-informative-words-regexs <non-informative-words-regexs>
The path to a file in which regular expressions (regexs) are stored, one per line. These
regexs are used to recognize non-informative words, which will only receive a minimun
score in the prot-scriber process that generates human readable description. There is a
default list hard-coded into prot-scriber. An example file can be downloaded here:
https://raw.githubusercontent.com/usadellab/prot-scriber/master/misc/non_informative_words_regexs.txt
- Note that this is an expert option.
-x, --exclude-not-annotated-queries
Exclude results from the output table that could not be annotated, i.e. 'unknown
protein' or 'unknown sequence family', respectively.
MANUAL
======
1. Summary
----------
'prot-scriber' uses reference descriptions ('stitle' in Blast terminology) from sequence similarity
search results (Blast Hits) to assign short human readable descriptions (HRD) to query biological
sequences or sets of them (a.k.a gene, or sequence, families). In this, prot-scriber consumes
sequence similarity search (Blast, Diamond, or similar) results in tabular format. A customized
lexical analysis is carried out on the descriptions ('stitle' in Blast terminology) of these Blast
Hits and a resulting HRD is assigned to the query sequences or query families, respectively.
2. prot-scriber input preparation
---------------------------------
This sections explains how to run your favorite sequence similarity search tool, so that it produces
tabular results in the format prot-scriber needs them. You can run sequence similarity searches with
Blast [McGinnis, S. & Madden, T. L. BLAST: at the core of a powerful and diverse set of sequence
analysis tools. Nucleic Acids Res 32, W20–W25 (2004).] or Diamond [Buchfink, B., Xie, C. & Huson,
D. H. Fast and sensitive protein alignment using DIAMOND. Nat Meth 12, 59–60 (2015).]. Note that
there are other tools to carry out sequence similarity searches which can be used to generate the
input for prot-scriber. As long as you have a tabular text file with the three required columns
holding the query identifier, the subject ('Hit') identifier, and the subject ('Hit') description
('stitle' in Blast terminology) prot-scriber will accept this as input.
Depending on the type of your query sequences the search method and searched reference databases
vary. For amino acid queries search protein reference databases, for nucleotide query sequences
search nucleotide reference databases. If you have protein coding nucleotide query sequences you can
choose to either search protein reference databases using translated nucleotide queries with
'blastx' or 'diamond blastx' or search reference nucleotide databases with 'blastn' or 'diamond
blastn'. Note, that before carrying out any sequence similarity searches you need to format your
reference databases. This is achieved by either the 'makeblastdb' (Blast) or 'makedb' (Diamond)
commands, respectively. Please see the respective tool's (Blast or Diamond) manual for details on
how to format your reference sequence database.
2.1 A note on TAB characters
----------------------------
TAB is often used as a field separator, e.g. by default in Diamond sequence similarity search result
tables, or to separate gene-family identifiers from their respective gene-lists. Consequently,
prot-scriber has several arguments that could be a TAB, e.g. the --field-separator (-p) or the
--seq-family-id-genes-separator (-i) (please see below for more details on these arguments).
Unfortunately providing the TAB character as a command line argument can be tricky. It is even more
tricky to write it into a manual like this, because it appears as a blank whitespace and cannot
easily be distiunguished from other whitespace characters. We thus write '<TAB>' whenever we mean
the TAB character. To type it in the command line and provide it as an argument to prot-scriber you
can (i) either use $'\t' (e.g. -p $'\t') or (ii) hit Ctrl+v and subsequently hit the TAB key on your
keyboard (e.g. -p ' ').
2.2 Which reference databases to search
---------------------------------------
For amino acid (protein) or protein coding nucleotide query sequences we recommend searching
UniProt's Swissprot and trEMBL. For nucleotide sequences UniRef100 and, or UniParc might be good
choices. Note that you can search _any_ database you deem to hold valuable reference sequences.
However, you might have to provide custom blacklist, filter, and capture-replace arguments for Blast
or Diamond output tables stemming from searches in these non UniProt databases (see section '3.
Technical manual' on the arguments --blacklist-regexs (-b), --filter-regexs (-l), and
--capture-replace-pairs (-c) for further details). If you want to search any NCBI reference
database, please see section 2.2.1 for more details.
2.2.1 NCBI reference databases
------------------------------
The National Center for Biotechnology Information (NCBI) has excellent reference databases to be
searched by Blast or Diamond, too. Note that NCBI and UniProt update each other's databases very
frequently. So, by searching UniProt only you should not loose information. Anyway, NCBI has e.g.
the popular non redundant ('NR') database. However, NCBI has a different description ('stitle' in
Blast terminology) format. To make sure prot-scriber parses sequence similarity search result (Blast
or Diamond) tables (SSSTs) correctly, you should use a tailored --filter-regexs (-l) argument. A
file containing such a list of regular expressions specifically tailored for parsing SSSTs produced
by searching NCBI reference databases, e.g. NR, is provided with prot-scriber. You can download it,
and edit it if neccessary, here:
https://raw.githubusercontent.com/usadellab/prot-scriber/master/misc/filter_stitle_regexs_NCBI_NR.txt
2.2.2 UniRef reference databases
------------------------------
The UniRef databases (UniProt Reference Clusters) provide clustered sets of sequences from the
UniProt Knowledgebase and selected UniParc records to obtain complete coverage of sequence space at
several resolutions (100%, 90% and 50% identity) while hiding redundant sequences. The UniRef100
database combines identical sequences and subfragments from any source organism into a single UniRef
entry (i.e. cluster). UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90%
or 50% sequence identity levels. To make sure prot-scriber parses sequence similarity search result
(Blast or Diamond) tables (SSSTs) correctly, you should use a tailored --filter-regexs (-l)
argument. A file containing such a list of regular expressions specifically tailored for parsing
SSSTs produced by searching UniRef databases is provided with prot-scriber. You can download it, and
edit it if neccessary, here:
https://raw.githubusercontent.com/usadellab/prot-scriber/master/misc/filter_stitle_regexs_UniRef.txt
2.3 Example Blast or Diamond commands
-------------------------------------
Note that the following instructions on how to execute your sequence similarity searches with Blast
or Diamond only include the information - in terms of selected output table columns - absolutely
required by 'prot-scriber'. You are welcome, of course, to have more columns in your tabular output,
e.g. 'bitscore' or 'evalue' etc. Note that you need to search each of your reference databases with
a separate Blast or Diamond command, respectively.
2.3.1 Blast
-----------
Generate prot-scriber input with Blast as follows. The following example uses 'blastp', replace it,
if your query sequence type makes that necessary with 'blastn' or 'blastx'.
blastp -db <reference_database.fasta> -query <your_query_sequences.fasta> -num_threads
<how-many-do-you-want-to-use> -out <queries_vs_reference_db_name_blastout.txt> -outfmt "6
delim=<TAB> qacc sacc stitle"
It is important to note, that in the above 'outfmt' argument the 'delim' set to '<TAB>' means you
need to actually type in a TAB character. (We write '<TAB>' here, so you see something, not only
whitespace.) Typically you can type it by hitting Ctrl+Tab in the terminal.
2.3.2 Diamond
-------------
Generate prot-scriber input with Diamond as follows. The following example uses 'blastp', replace
it, if your query sequence type makes that necessary with 'blastn' or 'blastx'.
diamond blastp -p <how-many-threads-do-you-want-to-use> --quiet -d <reference-database.dmnd> -q
<your_query_sequences.fasta> -o <queries_vs_reference_db_name_diamondout.txt> -f 6 qseqid sseqid
stitle
Note that diamond by default uses the '<TAB>' character as a field-separator for its output tables.
2.4 Gene Family preparation and analysis
----------------------------------------
Assume you have the proteomes of eight crucifer plant species and want to cluster the respective
amino acid sequences into gene families. Note that the following example provides code to be
executed in a BASH Shell (also available on Windows). We provide a very basic procedure to perform
the clustering:
(i) "All versus all" Blast or Diamond
Assume all amino acid sequences of the eight example proteomes stored in a single file
'all_proteins.fasta'
Run:
diamond makedb --in all_proteins.fasta -d all_proteins.fasta
diamond blastp --quiet -p <how-many-threads-do-you-want-to-use?> -d all_proteins.fasta.dmnd -q
all_proteins.fasta -o all_proteins_vs_all.txt -f 6 qseqid sseqid pident
(ii) Run markov clustering
Note that 'mcl' is a command line tool implementing the original Markov Clustering algorithm [Stijn
van Dongen, A cluster algorithm for graphs. Technical Report INS-R0010, National Research Institute
for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000]. On most systems you
can install the 'mcl' binary using the respective package manager, e.g. 'sudo apt-get update && sudo
apt-get install -y mcl' (Debian / Ubuntu).
mcl all_proteins_vs_all.txt -o all_proteins_gene_clusters.txt --abc -I 2.0
(iii) Add gene family names to mcl output and filter out singleton clusters
Note that we use the GNU tools 'sed' and 'awk' to do some basic post-processing of the 'mcl' output.
sed -e 's/\t/,/g' all_proteins_gene_clusters.txt | awk -F "," 'BEGIN{i=1}{if (NF > 1){print
"Seq-Fam_" i "\t" $0; i=i+1}}' > all_proteins_gene_families.txt
Congratulations! You now have clustered your eight plant crucifer proteomes into gene families (file
'all_proteins_gene_families.txt').
(iv) Run prot-scriber
We assume that you ran either 'blastp' or 'diamond blastp' (see section 2.3 for details) to search
your selected reference databases with the 'all_proteins.fasta' queries. Here, we assume you have
searched UniProt's Swissprot and trEMBL databases.
prot-scriber -f all_proteins_gene_families.txt -s all_proteins_vs_Swissprot_blastout.txt -s
all_proteins_vs_trEMBL_blastout.txt -o all_proteins_gene_families_HRDs.txt
Note, that you can get the manual directly from prot-scriber
. In the command prompt (cmd
or PowerShell on Windows) or the Terminal (Mac OS, Linux, or Unix) use
prot-scriber --help
to get it printed.
Happy prot-scribing
!
prot-scriber
is blazingly fast and has low memory requirements. Consider the following two standard use cases, in which prot-scriber
generated Human readable descriptions (HRDs) for (i) a single species and (ii) gene families.
On a standard Laptop with 4 cores prot-scriber
took approx. 7 seconds and used a little under 50 MB RAM to generate human readable descriptions for a complete plant proteome with Blast search Hits for 32,567 distinct query proteins (input: Blast result table from searches in UniProt Swissprot 66 MB, Blast result table from searches in UniProt trEMBL 144 MB)
On a standard Laptop with 4 cores prot-scriber
took approx. 15 seconds and used a little under 180 MB RAM to generate human readable descriptions for 24,072 gene families with Blast search Hits for 71,610 distinct query proteins (input: Blast result table from searches in UniProt Swissprot 126 MB, Blast result table from searches in UniProt trEMBL 273 MB)
prot-scriber comes with a simple and small R script to generate a word-cloud plot from any prot-scriber results. To use it, you must have R and the following packages installed (click here to learn how to install R packages):
RColorBrewer
wordcloud
wordcloud2
htmlwidgets
webshot
You find the script prot-scriber-word-cloud.R
in the misc
directory or download it directly from here.
In your Terminal (cmd
or Power-Shell on Windows) you can invoke the script as follows:
Rscript prot-scriber-word-cloud.R input-prot-scriber-table.txt output-files-name
Note that the first argument is the output-table generated by prot-scriber and the second is a file-name, without file extension (e.g. .pdf
). Several output files will be created, two PDFs and one HTML.
Happy word-clouding!
prot-scriber
is open source. Please feel free and invited to contribute.
This repository is set up to use GitHub Actions (see .github/workflows/push.yml
for details). We use GitHub actions to trigger compilation of prot-scriber
every time a Git Tag is pushed to this repository. So, if you, after writing new code and committing it, do the following in your local repo:
git tag -a 'version-Foo-Bar-Baz' -m "My fancy new version called Foo Bar Baz"
git push origin master --tags
GitHub will automatically compile prot-scriber
and provide executable binary versions of prot-scriber for the platforms and operating systems mentioned above. The resulting binaries are then made available for download on the releases page.
In short, you do not need to worry about compiling your latest version to make it available for download and the different platforms and operating systems, GitHub Actions take care of this for you.