For genome-wide target sRNA predictions, we assume the set of energy values predicted for all target sufficiently diverse to be used as a background energy model for minimum energies of putative target sequences. Thus, we can fit a generalized extreme value (GEV) distribution to the data that is subsequently used to estimate p-values for each energy.
The IntaRNA_CSV_p-value.R
script takes an IntaRNA CSV output file (assumed to be
sufficiently large and sane enough for GEV fitting) to compute respective
p-value estimates. The output consists of the input table extended with a
p-value
column. If no output file is given or in- and output file names are
equal, the input file is overwritten!
You can (optionally) specify the column name for which p-values are to be
estimated. Example calls are given below.
# overwriting the input file with p-value-extended table
Rscript --vanilla IntaRNA_CSV_p-value.R IntaRNA-output.csv
# creating a new output file for p-value extension
Rscript --vanilla IntaRNA_CSV_p-value.R IntaRNA-output.csv IntaRNA-output-with-pValue.csv
# computing p-values for normalized energies (has to be present in file IN.csv)
Rscript --vanilla IntaRNA_CSV_p-value.R IN.csv IN-pValue.csv E_norm
To visualize sequences' regions covered by RNA-RNA interactions predicted by
IntaRNA, you can use IntaRNA_plotRegions.R
by providing the following arguments (in
the given order)
- CSV-IntaRNA output file (semicolon separated) covering the columns
start,end,id
with suffix1
or2
to plot target or query regions, respectively 1
or2
to select whether to plot target or query regions- output file name with a file-format-specific suffix from
.pdf
,.png
,.svg
,.eps
,.ps
,.jpeg
,.tiff
An example is given below, when calling
Rscript --vanilla IntaRNA_plotRegions.R pred.csv 1 plotRegions.example.png
with pred.csv
containing
id1;start1;end1;id2;start2;end2
b0001;266;273;query;116;123
b0002;204;231;query;85;111
b0003;229;262;query;96;125
b0004;265;300;query;10;38
b0005;281;295;query;5;22