Skip to content

UserGuide: Intialize and load data

S.Alves edited this page Apr 13, 2022 · 23 revisions

Now that we have checked our files, we can start the analysis. You will find an example of R script containing all commands we will discuss in the folder "ScriptR/AskoR_analysis_script.R".

First we will load our working directory and then our AskoR tool:

# workspace
setwd("/path/to/your/wokspace/")

# Load askoR library (if you have it installed).
library(askoR)
### OR 
# If you use AskoR.R file (available in the ScriptR directory)
source("/directory/where/you/downloaded/the/file/AskoR.R")

# Sets defaults parameters
parameters<-Asko_start()

Don't forget to replace the paths to your working directory "setwd" (containing the input folder) and if you use the AskoR.R script, specify its path as well.
All the generated files and images will be in a folder named by default "AskoRanalysis", you can change this name and give a project name for your analysis (short and without space):

# output directory name (default AskoRanalysis)
parameters$analysis_name="DEG_test"
# project name (default DEprj)
parameters$projectName= "TestProject"

Once these steps have been completed, you can enter the names of the different files used in the analysis:

# matrix of different contrasts desired
parameters$contrast_file = "Contrasts.txt"    
# file containing the functional annotations for each gene
parameters$annotation = "Genes_annotations.txt"      
# GO annotation files
parameters$geneID2GO_file = "GO_annotations.txt"   
  • If you use files of counts :
# file describing all samples
parameters$sample_file = "Samples_CountsFiles.txt"
# column with the gene names (default 1)
parameters$col_genes = 1          
# column with the counts values (default 7)
parameters$col_counts = 7 
# field separator (default "\t")
parameters$sep = "\t" 
  • If you use a counts matrix :
# matrix of count for all samples/conditions
parameters$fileofcount = "CountsMatrix.txt"  
# file describing all samples
parameters$sample_file = "Samples_CountsMatrix.txt"  

If you don't want to work on all your samples, it is not necessary to modify all your files to remove them. Indeed, you have to select or delete samples. To do this you can use one of the two following options:

  • parameters$rm_sample=c("sample1","sample2","sample3",...) to remove the samples you do not want to analyze.
  • parameters$select_sample=c("sample1","sample2","sample3",...) to select the samples you want to study.

You can also use regular expressions.
Let's take the following samples as an example: AC1R1, AC1R2, AC1R3, AC2R1, AC2R2, AC2R3, BC1R1, BC1R2, BC1R3, BC2R1, BC2R2, BC2R3.
You want to delete all AC1 samples :

parameters$rm_sample="AC1.*"
parameters$regex=TRUE
# You no longer have the AC1R1, R2 and R3 samples.
Samples:
 [1] "BC1R1" "BC1R2" "BC1R3" "AC2R1" "AC2R2" "AC2R3" "BC2R1" "BC2R2" "BC2R3" "AC3R1" "AC3R2" "AC3R3" "BC3R1" "BC3R2"
[15] "BC3R3"

Or you want to delete all C1 samples:

parameters$rm_sample=".*C1.*"
parameters$regex=TRUE
# You no longer have the AC1R1, R2, R3 and BC1R1, R2, R3 samples.
Samples:
 [1] "AC2R1" "AC2R2" "AC2R3" "BC2R1" "BC2R2" "BC2R3" "AC3R1" "AC3R2" "AC3R3" "BC3R1" "BC3R2" "BC3R3"

This also works for the "select_sample" parameter.

In this study, we are informed that two samples, "AC3R2" and "BC3R3", had problems during the experiments, it is requested to extract it from our analysis.

# delete samples AC3R2 and BC3R3 
parameters$rm_sample = c("AC3R2","BC3R3")

It's time to load your data:

data<-loadData(parameters)
##
## Created directories:
##    ./DEG_test/ 
##    ./DEG_test/DataExplore/ 
##    ./DEG_test/NormCountsTables/ 
##    ./DEG_test/DEanalysis/ 
##    ./DEG_test/DEanalysis/DEimages/ 
##    ./DEG_test/DEanalysis/DEtables/ 
##    ./DEG_test/DEanalysis/AskoTables/ 
## 
## Samples:
##  [1] "AC1R1" "AC1R2" "AC1R3" "BC1R1" "BC1R2" "BC1R3" "AC2R1" "AC2R2" "AC2R3"
## [10] "BC2R1" "BC2R2" "BC2R3" "AC3R1" "AC3R3" "BC3R1" "BC3R2"
## 
## Conditions :
##  [1] AC1 AC1 AC1 BC1 BC1 BC1 AC2 AC2 AC2 BC2 BC2 BC2 AC3 AC3 BC3 BC3
## Levels: AC1 AC2 AC3 BC1 BC2 BC3
## 
## Contrasts:
##     AC1vsAC2 AC1vsAC3 AC2vsAC3 BC1vsBC2 BC1vsBC3 BC2vsBC3 AC1vsBC1 AC2vsBC2
## AC1        +        +        0        0        0        0        +        0
## AC2        -        0        +        0        0        0        0        +
## AC3        0        -        -        0        0        0        0        0
## BC1        0        0        0        +        +        0        -        0
## BC2        0        0        0        -        0        +        0        -
## BC3        0        0        0        0        -        -        0        0
##     AC3vsBC3
## AC1        0
## AC2        0
## AC3        +
## BC1        0
## BC2        0
## BC3        -

You can see that two folders have been created in DEG_test:

  • DataExplore: will contain all images created by filtering, normalization and correlation functions.
  • DEanalysis with 3 sub-folder :
         1) DEtables will contain the results of the differential analysis for each contrast.
         2) DEimages with all VD, MDA and heatmap plots. (See DE analysis section)
         3) AskoTables Same than DEtables but these files are in a format readable by Askomics.

Then the samples and conditions that have been loaded are displayed. These have been loaded into a structure called "data". See some commands to display your data:

# Displays all samples recorded
data$samples  
##       condition genotype treatment       color
## AC1R1       AC1        A        C1 darkorchid2
## AC1R2       AC1        A        C1 darkorchid2
## AC1R3       AC1        A        C1 darkorchid2
## BC1R1       BC1        B        C1 saddlebrown
## BC1R2       BC1        B        C1 saddlebrown
## BC1R3       BC1        B        C1 saddlebrown
## AC2R1       AC2        A        C2  indianred3
## AC2R2       AC2        A        C2  indianred3
## AC2R3       AC2        A        C2  indianred3
## BC2R1       BC2        B        C2      khaki2
## BC2R2       BC2        B        C2      khaki2
## BC2R3       BC2        B        C2      khaki2
## AC3R1       AC3        A        C3  palegreen4
## AC3R3       AC3        A        C3  palegreen4
## BC3R1       BC3        B        C3  steelblue3
## BC3R2       BC3        B        C3  steelblue3

# Displays all contrast recorded
data$contrast 
##     AC1vsAC2 AC1vsAC3 AC2vsAC3 BC1vsBC2 BC1vsBC3 BC2vsBC3 AC1vsBC1 AC2vsBC2
## AC1        1        1        0        0        0        0        1        0
## AC2       -1        0        1        0        0        0        0        1
## AC3        0       -1       -1        0        0        0        0        0
## BC1        0        0        0        1        1        0       -1        0
## BC2        0        0        0       -1        0        1        0       -1
## BC3        0        0        0        0       -1       -1        0        0
##     AC3vsBC3
## AC1        0
## AC2        0
## AC3        1
## BC1        0
## BC2        0
## BC3       -1

# Displays design experiment
data$design
##       AC1 AC2 AC3 BC1 BC2 BC3
## AC1R1   1   0   0   0   0   0
## AC1R2   1   0   0   0   0   0
## AC1R3   1   0   0   0   0   0
## BC1R1   0   0   0   1   0   0
## BC1R2   0   0   0   1   0   0
## BC1R3   0   0   0   1   0   0
## AC2R1   0   1   0   0   0   0
## AC2R2   0   1   0   0   0   0
## AC2R3   0   1   0   0   0   0
## BC2R1   0   0   0   0   1   0
## BC2R2   0   0   0   0   1   0
## BC2R3   0   0   0   0   1   0
## AC3R1   0   0   1   0   0   0
## AC3R3   0   0   1   0   0   0
## BC3R1   0   0   0   0   0   1
## BC3R2   0   0   0   0   0   1
## attr(,"assign")
## [1] 1 1 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$Group
## [1] "contr.treatment";

# Displays the first 5 lines and 8 columns of counts table.
data$dge$counts[1:5,1:8] 
##             AC1R1  AC1R2 AC1R3 BC1R1 BC1R2 BC1R3 AC2R1 AC2R2
## Gene_000001   0.0   1.00   0.0   0.0   0.0     1   1.0   0.0
## Gene_000002  43.5  25.33  31.5  27.5  29.5    29  34.5  32.5
## Gene_000003   8.0   4.00   5.0  30.0  16.0    13  20.0  26.0
## Gene_000004 781.0 412.00 626.0 558.0 538.0   346 519.0 370.0
## Gene_000005  16.0   7.00  13.0   9.0   8.0     6  10.0  10.0

# Total number of genes:
dim(data$dge$counts)[1]
## [1] 12811

# Total number of samples:
dim(data$dge$counts)[2]
## [1] 16

The next step is to generate files describing your experiences for Askomics integration (context, contrast and condition).
IMPORTANT: Even if you don't plan to use Askomics, this command is mandatory because it generates a data structure "asko_data" that will be used in the further analysis.

asko_data<-asko3c(data, parameters)