Skip to content

Commit

Permalink
readme updated
Browse files Browse the repository at this point in the history
  • Loading branch information
rampasek committed Feb 17, 2014
1 parent da6bc32 commit be3514b
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 20 deletions.
Binary file modified manual/readme.pdf
Binary file not shown.
42 changes: 22 additions & 20 deletions manual/readme.tex
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,13 @@
%

\documentclass[11pt]{article}
%\topmargin=-2.5cm
%\usepackage{a4wide}
\usepackage{fullpage}
\topmargin=-0.5cm
\textheight=1.05\textheight
%\usepackage{a4wide}

\usepackage[english]{babel}
\usepackage[IL2]{fontenc}
%\usepackage[IL2]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[none]{hyphenat}
\usepackage{amsmath}
Expand Down Expand Up @@ -52,25 +54,25 @@ \section*{RNArobo 2.1.0 -- Quick Start Guide}
The format of an RNArobo descriptor is an extension of the descriptor format used by RNABob \cite{eddy1996}, thus RNABob descriptors are compatible with RNArobo. A descriptor consists of three parts:
\begin{enumerate}
\item a \textbf{motif map} -- a list of individual \emph{structural elements}
ordered from 5' to 3' end along the sequence
ordered from 5$\mathbb{'}$ to 3$\mathbb{'}$ end along the sequence
\item a detailed \textbf{specification} of each structural element
\item an optional \emph{search order}
\end{enumerate}
Each structural element is either single stranded (denoted by \texttt{s}) or helical (denoted by \texttt{h} or \texttt{r}). Detailed specification of each element consists of (the fields in bold are \textbf{mandatory}, while fields in italic are \emph{optional}):
\begin{enumerate}
\item number of \textbf{mismatches} allowed (in helical elements mismatches are allowed only in the positive strand)
\item[(1b.)] number of \textbf{mispairs} allowed (for helical elements only)
\item number of \emph{insertions} allowed
\item \textbf{primary sequence} constraints: a string composed of IUPAC codes and wild cards ``\texttt{*}'' that allow to match any character or to be skipped; alternatively, an abbreviation for e.g. 10 wild cards can be written as ``\texttt{[10]}''
\item[(3b.)] primary \textbf{sequence constraints} for the \textbf{negative strand} of a helical element; in helical elements wild cards can occur only in pairs, i.e. for every wild card there must be a corresponding wild card in the other strand at the exactly opposite position
\item IUPAC code for \emph{allowed insertions}
\item[(5.)] a \textbf{transformation} string specifying pairings allowed in the \emph{relational} element \texttt{r}
\item number of single nucleotide \emph{insertions} allowed
\item \textbf{primary sequence} constraints: a string composed of IUPAC nucleotide codes and wild cards ``\texttt{*}'' that allow matching any nucleotide or none; alternatively, an abbreviation for e.g. 10 wild cards can be written as ``\texttt{[10]}''
\item[(3b.)] primary \textbf{sequence constraints} for the \textbf{negative strand} of a helical element. In helical elements wild cards can occur only in pairs, i.e. for every wild card there must be a corresponding wild card in the other strand at the exactly opposite position
\item IUPAC nucleotide code for \emph{allowed insertions}
\item[(5.)] a \textbf{transformation} string specifying pairings allowed in the \emph{relational} element \texttt{r}; for example for base-pairing or wobble-pairing.
\end{enumerate}

\section*{Example Descriptors}
See the following simple motif composed of two elements -- a helix \texttt{h1} and a single strand \texttt{s1}:
See the following simple motif composed of two elements -- a helix \texttt{h1} capped by a single strand \texttt{s1}:
\begin{quote}
$\overbrace{\texttt{h1 s1 h1'}}^{\text{motif map}}$\\
$\overbrace{\texttt{h1 s1 h1'}}^{\text{motif map}}$\\\\
\texttt{h1} $\overbrace{\texttt{1}}^{\text{\# mismatches}}$ \texttt{:} $\overbrace{\texttt{0}}^{\text{\# mispairs}}$ ~
$\overbrace{\texttt{NNN**CC}}^{\text{positive strand}}$\texttt{:}
$\overbrace{\texttt{GG**NNN}}^{\text{negative strand}}$ \\
Expand All @@ -80,32 +82,32 @@ \section*{Example Descriptors}

\end{quote}

Unlike RNABob, RNArobo enables you to allow nucleotide insertions in a structural element. Syntax of allowing insertions is similar to specification of the maximal number of mismatches (or mispairs). You can specify the maximal number of insertions and what nucleotides are allowed as an insertion. To specify the nucleotide constraints, you can use IUPAC code as for any other primary sequence constraints. Insertions are not allowed at the very beginning and end of the matched regions and in a helix insertions must not be adjacent nor opposite. Usage should be clear from the example descriptor:
Unlike RNAbob, RNArobo allows nucleotide insertions in a structural element. Syntax for insertions is similar to specification of the maximum number of mismatches (or mispairs). Maximum number of insertions and identity of nucleotides can be specified. To specify the nucleotide constraints, use IUPAC code as for any other primary sequence constraints. Insertions are not allowed at the very beginning and end of the matched regions and helical insertions cannot be adjacent nor opposite. Usage should be clear from the example descriptor:

\begin{quote}
\texttt{h1 s1 h1'}\\
\texttt{h1 0:0\textbf{:2} ~NNN**CC:GG**NNN\textbf{:A}}\\
\texttt{s1 0\textbf{:1} ~~~ACCRNNT\textbf{:Y}}

In the helix we allow up to 2 insertions of Adenosine, while in the single strand only one insertion of a pyrimidine nucleotide is allowed (`\texttt{Y}' stands for Cytosine or Thymine/Uracil).
In the \texttt{h1} helix we allow up to 2 insertions of adenosine, while in the single strand \texttt{s1} only one insertion of a pyrimidine nucleotide is allowed (`\texttt{Y}' stands for Cytosine or thymine/uracil).
\end{quote}

Note, RNArobo doesn't discriminate Thymine and Uracil, you can use them interchangeably in both the descriptor and searched FASTA sequence.
Note, RNArobo doesn't discriminate thymine and uracil, and they can be used interchangeably in both the descriptor and searched FASTA sequence.

To specify custom pairing function for a helical element, you can use an \emph{relational} element instead for standard helix, e.g.:
To specify custom pairing function for a helical element, an \emph{relational} element instead, of a standard helix, is used:
\begin{quote}
\texttt{r1 s1 r1'}\\
\texttt{r1 0:0:2 ~NNN**CC:GG**NNN:A ~TGCA}\\
\texttt{s1 0\textbf{:1} ~~~ACCRNNT:Y} \\
\texttt{\textbf{R s1 h1}}

This variation of the previous descriptor allows only canonical base-pairs \texttt{A-T} and \texttt{C-G} in the relational element \texttt{r1}. The individual IUPAC codes in the \emph{transformation} string \texttt{TGCA} define nucleotides that can pair with \texttt{A}, \texttt{C}, \texttt{G}, and \texttt{T}, respectively in this order. For default helical elements (e.g. \texttt{h1}) RNArobo allows also Watson-Crick base pairs, as the default \emph{transformation} string is \texttt{TGYR}.
This variation of the previous descriptor allows only canonical base-pairs \texttt{A-T} and \texttt{C-G} in the relational element \texttt{r1}. The individual IUPAC codes in the \emph{transformation} string \texttt{TGCA} define nucleotides that can pair with \texttt{A}, \texttt{C}, \texttt{G}, and \texttt{T}, respectively, in this order. For default helical elements (e.g. \texttt{h1}) RNArobo allows also \texttt{G-U} wobble pair, as the default \emph{transformation} string is \texttt{TGYR}.
\end{quote}

(\textbf{Optional}) The last line of the example descriptor above illustrates usage of an optional reorder command which specifies the order in which elements are internally searched by the RNArobo algorithm, similarly to RNAMot \cite{gautheret1990}. If this command is absent or does not contain all elements, an automatic data-driven method is used to determine the best possible ordering of all remaining elements. This command has no principal impact on the actual results of the search, but defining a previously trained order can speed up the search by few seconds.
(\textbf{Optional}) The last line of the example descriptor above illustrates usage of an optional reorder command, which specifies the order in which elements are internally searched by the RNArobo algorithm, similarly to RNAMot \cite{gautheret1990}. If this command is absent or does not contain all elements, an automatic data-driven method is used to determine the best possible ordering of all remaining elements. This command has no principal impact on the actual results of the search, but defining a previously trained order can speed up the search by few seconds.

\section*{Installation / Usage}
To run RNArobo on your system, you need GCC C++ compiler (tested with version 4.4.5) or for 64-bit Linux systems we directly provide an executable binary.
To run RNArobo on your system, GCC C++ compiler (tested with version 4.4.5) is required, or for 64-bit Linux systems we directly provide an executable binary. To achieve the best run-time performance, we highly recommend systems equipped by CPU with SSE2 instruction set (manufactured in 2003 or newer).

\begin{enumerate}
\item[\textbf{1.}] \textbf{Download} the most recent version of RNArobo at \url{http://compbio.fmph.uniba.sk/rnarobo}. There you can download the executable binary for 64-bit Linux systems as well as the source code package.
Expand Down Expand Up @@ -140,7 +142,7 @@ \section*{Installation / Usage}
\end{verbatim}
\end{enumerate}

Output of an RNArobo run is printed on the standard output and consists of a header and of a list of found matches. Matches in the list are in the order as they were found in the database file from its beginning to its end. Every match is compounded of two lines. The first line gives the name and description (if any) of the sequence where this match occurs, the beginning position where the match starts in the sequence and the ending position where the match ends. This line is followed by a line containing the match itself, that is, the substring of the sequence defined by the starting and ending positions. A symbol of pipe ``\textbar'' delimits individual elements of the match.
Output of an RNArobo run is printed on the standard output and consists of a header and of a list of found matches. Matches in the list are in the order as they were found in the database file from its beginning to its end. Every match is composed of two lines. The first line gives the name and description (if any) of the sequence where this match occurs, the beginning position where the match starts in the sequence and the ending position where the match ends. This line is followed by a line containing the match itself, that is, the substring of the sequence defined by the starting and ending positions. A symbol of pipe ``\textbar'' delimits individual elements of the match.

\setdescription{leftmargin=\parindent,labelindent=\parindent}
\subsection*{Available RNArobo Options:}
Expand All @@ -157,7 +159,7 @@ \subsection*{Available RNArobo Options:}
\end{description}

\subsection*{Advanced Options}
To override default search order training parameters (not recommended though):
To override default search order training parameters (not recommended):
\begin{description}
\item[\textbf{-{}-k INT}] ~~set length of tuples used in training
\item[\textbf{-{}-limit INT}] ~~set max size of training set (max number of tuples)
Expand Down

0 comments on commit be3514b

Please sign in to comment.