-
Notifications
You must be signed in to change notification settings - Fork 6
UserGuide: Files Descriptions
You'll find a test set in the inst/extdata/input folder. It'll be used for the rest of the documentation. Contents of input folder:
- Count matrix file: CountsMatrix.txt OR Counts files per samples: files in "counts" directory (counts.zip)
- Samples file: Samples_CountsMatrix.txt OR Samples_CountsFiles.txt
- Contrasts file: Contrasts.txt
- Genes annotations file: Genes_annotations.txt (optional)
- GO annotations file: GO_annotations.txt (optional)
IMPORTANT : All input files must be in a folder named input (case sensitive).
You have a count file per sample, they can be in text or csv format. In this case, you will have to enter the Samples file with the path and name of the count files for each sample in a "file" column. This one can contain several columns according to the counting tools used, you will have to inform the following parameters:
-
parameters$col_genes
column number with GeneId (default 1) -
parameters$col_counts
column number with count (default 7) -
parameters$sep
column separator (default "\t" )
Example of counts file content:
Geneid | Chr | Start | End | Strand | Length | Counts |
---|---|---|---|---|---|---|
Gene_000001 | Random_Chr_001 | 1692 | 1907 | - | 215 | 0 |
Gene_000002 | Random_Chr_001 | 6641 | 8705 | - | 2064 | 43.5 |
Gene_000003 | Random_Chr_001 | 9228 | 9569 | - | 341 | 8 |
Gene_000004 | Random_Chr_001 | 12009 | 13155 | - | 1146 | 781 |
Gene_000005 | Random_Chr_001 | 15242 | 15844 | + | 602 | 16 |
Gene_000006 | Random_Chr_001 | 16304 | 19834 | + | 3530 | 9 |
Gene_000007 | Random_Chr_001 | 20595 | 21625 | - | 1030 | 13.83 |
Gene_000008 | Random_Chr_001 | 22377 | 23461 | - | 1084 | 565.33 |
... |
The corresponding parameters:
parameters$col_genes=1
parameters$col_counts=7
parameters$sep="\t"
In this example column 7 contains the counts and the gene identifiers are in the first column, it's a tabulate file so the column separator is encoded "\t".
It is also possible to have a table grouping the counts for each gene in each sample, in tabulated text format. The Samples file should not contain a file column, you will have to enter the name of your file with this parameter: parameters$fileofcount="CountsMatrix.txt"
.
Example of a counts matrix content:
Geneid | AC1R1 | AC1R2 | AC1R3 | BC1R1 | BC1R2 | BC1R3 | ... |
---|---|---|---|---|---|---|---|
Gene_000001 | 0 | 1 | 0 | 0 | 0 | 1 | ... |
Gene_000002 | 43.5 | 25.33 | 31.5 | 27.5 | 29.5 | 29 | ... |
Gene_000003 | 8 | 4 | 5 | 30 | 16 | 13 | ... |
Gene_000004 | 781 | 412 | 626 | 558 | 538 | 346 | ... |
Gene_000005 | 16 | 7 | 13 | 9 | 8 | 6 | ... |
Gene_000006 | 9 | 4 | 5 | 21 | 15 | 12 | ... |
... |
This tabulated file describes the design of experiments. The first and second columns are mandatory and are named "sample" and "condition". You may have several other columns. The contents of the condition column will be the same as in the Contrast file.
The column "color" is optional, it allows to predefined the color of the sample in the graphs. If it is absent, askoR will assign colors itself.
The column "file" is mandatory if you have samples counts files. In the example below, these files are grouped in a "counts" folder. You do not need to specify the name of the "input" folder (i.e. input/counts/AC1R1_counts.txt) since, by default, it will search for it.
Don't forget to enter the name the name of your Samples file with this parameter: parameters$sample_file="Samples.txt"
.
Example of a Samples.txt file content:
sample | condition | genotype | treatment | color | file |
---|---|---|---|---|---|
AC1R1 | AC1 | A | C1 | darkorchid2 | counts/AC1R1_counts.txt |
AC1R2 | AC1 | A | C1 | darkorchid2 | counts/AC1R2_counts.txt |
AC1R3 | AC1 | A | C1 | darkorchid2 | counts/AC1R3_counts.txt |
BC1R1 | BC1 | B | C1 | saddlebrown | counts/BC1R1_counts.txt |
BC1R2 | BC1 | B | C1 | saddlebrown | counts/BC1R2_counts.txt |
... |
This tabulated file indicates contrasts you wish to make between your different conditions.
The first column corresponds to the condition column of the Samples.txt file, then the others are columns the comparisons to be made.
The names of the comparisons should be in the form first condition (AC1) vs second condition (AC2), without space: "AC1vsAC2". Then under these columns, "AC1" will be noted "+" and "AC2" will be noted "-", the rest "0".
You will have to enter the name of your file with this parameter: parameters$contrast_file="Contrasts.txt"
Example of Contrasts.txt file content:
Condition | AC1vsAC2 | AC1vsAC3 | AC2vsAC3 | BC1vsBC2 | ... |
---|---|---|---|---|---|
AC1 | + | + | 0 | 0 | ... |
AC2 | - | 0 | + | 0 | ... |
AC3 | 0 | - | - | 0 | ... |
BC1 | 0 | 0 | 0 | + | ... |
BC2 | 0 | 0 | 0 | - | ... |
... |
This tabulated file contains the annotations of your genes (optional). It can contain several columns but the first one must be the gene identifier and named SeqName. You will have to enter the name of your file with this parameter: parameters$annotation="annotations.txt"
.
Example of annotation file content:
SeqName | Description |
---|---|
Gene_000001 | hypothetical protein pbra 009537 |
Gene_000002 | hypothetical protein pbra 009324 |
Gene_000003 | histone-lysine n-methyltransferase nsd2 |
Gene_000004 | hypothetical protein pbra 009496 |
... |
This tabulated file will be WITHOUT HEADER, the first column contains the gene identifier and the second column contains all the corresponding GOs separated by a comma. This file is optional, you will have to enter the name of your file with this parameter: parameters$geneID2GO_file="GO_annotations.txt"
.
NOTE: This file is mandatory for GO enrichment analysis (Cf. GO enrichment Section), for clustering analysis it is not mandatory, you just won't have GO enrichment on your clusters (Cf. ClustAndGO Section).
Example of GOs annotation file:
Gene_000001 | GO:0003676,GO:0015074 |
Gene_000002 | GO:0003676,GO:0015074 |
Gene_000003 | GO:0005488,GO:0006807,GO:0016740,GO:0043170,GO:0044238 |
Gene_000005 | GO:0005525,GO:0005525,GO:0005525 |
... |