NetSeekR

A networks analysis pipeline for RNASeq time series data.

NetSeekR is a network analysis R package that includes the capacity to analyze time series of RNASeq data, perform correlation and regulatory network inferences and use network analysis methods to summarize the results of a comparative genomics study.

Authors: Himangi Srivastava, Drew Ferrell, and George V. Popescu.

The NetSeekR code requires specific versions for packages that are used.

Package	Version
pacman	0.5.1
BiocManager	1.30.10
magrittr	1.5
readr	1.3.1
purrr	0.4.2
stringr	0.3.3
ggplot2	1.4.0
devtools	3.2.1
flashClust	2.2.1
tidyr	1.01-2
networkD3	1.0.0
igraph	0.4
limma	1.2.4.2
edgeR	3.42.0
topGO	2.37.0
WGCNA	1.68
biomaRt	2.42.0
Rgraphviz	2.30.0
dplyr	0.8.3

Below are the steps to run NetSeekR.

Set the working directory to the NetSeekR path.

setwd(<<path/to/NetSeekR>>)

Unzip the NetSeekR file.

unzip('NetSeekR.zip')

Load packages and source functions for NetSeekR.

source('scripts/NetSeekR.R')

Edit configuration file and sample comparison matrix.

note Below is a template configuration file which needs to be edited per usage.

analysis_type	custom tag
design_matrix	path to experimental design matrix
edger_adjustment_method	edgeR: p-value adjustment method
edger_lfc	limma: minimum log2-fold-change that is considered scientifically meaningful
edger_method	NOT USED
feature_counts_path	path to feature counts software
kallisto	Boolean value for Kallisto execution decision
kallisto_bias	sequence based bias correction
kallisto_bootstrap_samples	bootstrap sample number
kallisto_chromosomes	tab separated file with chromosome names and lengths
kallisto_fasta_files	path to genome annotation file
kallisto_fastq_files	reads to quantify
kallisto_fr_stranded	strand specific reads, first read forward
kallisto_fragment_length	estimated average fragment length
kallisto_fusion	search for fusions for Pizzly
kallisto_genomebam	project pseudoalignments to genome sorted BAM file
kallisto_gtf	GTF file for transcriptome information (required for --genomebam)
kallisto_index	location to write genome index from Kallisto (required for Kallisto alignment)
kallisto_kmer_size	k-mer (odd) length (defaut: 31, max value: 31
kallisto_make_unique	replace repeated target names with unique names
kallisto_output_dir	directory to write quantification output to
kallisto_path	path to Kallisto software
kallisto_plaintext	output plaintext instead of HDF5
kallisto_pseudobam	save pseudoalignments to transcriptome to BAM file
kallisto_rf_stranded	strand specific reads, first read reverse
kallisto_sd	estimated standard deviation of fragment length (default: -l, -s values are estimated from paired end data, but are required when using --single)
kallisto_seed	seed for the bootstrap sampling (default: 42)
kallisto_single	quantify single-end reads
kallisto_single_overhang	include reads where unobserved rest of fragment is predicted to lie outside a transcript
kallisto_threads	number of threads to use (default: 1)
sample_comparisons_file	path to the sample comparison file for differential gene expression testing
sample_covariates	experimental design matrix column names to be used as covariates with Sleuth
significance_cutoff	a cutoff value for determining significance
sleuth_gene_mode	Boolean value for Sleuth gene mode execution decision
sleuth_transcript_mode	Boolean value for Sleuth transcript mode execution decision
star	Boolean value for STAR execution decision
star_genomeDir	path to the directory where the genome indices are stored
star_genomeFastaFiles	path to a FASTA file with the genome reference sequences
star_path	path to STAR software
star_readFilesIn	path to the folder containing the sequences to be mapped
star_runThreadN	number of threads to be used for genome generation, it has to be set to the number of available cores on the server node
star_sjdbGTFfile	path to the file with annotated transcripts in the standard GTF format
star_sjdbOverhang	length of the genomic sequence around the annotated junction to be used in constructing the splice junctions database>

Align batches of reads.

alignment_results <- implement_alignment(arguments_file = <<path/to/configuration file>>)

Test for differential gene expression.

note An edited sample comparison matrix needs to be supplied for differential gene expression testing. An example of an edited sample comparison matrix file is below. A sample comparison matrix does not need headers, only sample identifiers. Which samples to compare in differential testing should be written row-wise.

Example
SL209924	SL209925	SL209926	SL209921	SL209922	SL209923
SL209927	SL209928	SL209937	SL209921	SL209922	SL209923
SL209938	SL209939	SL209940	SL209921	SL209922	SL209923
SL209944	SL209945	SL209946	SL209941	SL209942	SL209943
SL209947	SL209948	SL209949	SL209941	SL209942	SL209943
SL209950	SL209951	SL209952	SL209941	SL209942	SL209943

implement_differential_gene_expression(alignment_results)

Network analysis is then conducted assuming sets of differentially expressed genes are available.

implement_network_analysis(alignment_tool = 'star', alignment_results = alignment_results)

The network analysis function above will produce an image of the network. The image can be saved from the ‘Export’ tab at the top of the window.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

NetSeekR

Files

README.md

Latest commit

History

README.md

File metadata and controls

NetSeekR