UserGuide: Files Descriptions

Input files description

You'll find a test set in the inst/extdata/input folder. It'll be used for the rest of the documentation. Contents of input folder:

Count matrix file: CountsMatrix.txt OR Counts files per samples: files in "counts" directory (counts.zip)
Samples file: Samples_CountsMatrix.txt OR Samples_CountsFiles.txt
Contrasts file: Contrasts.txt
Genes annotations file: Genes_annotations.txt (optional)
GO annotations file: GO_annotations.txt (optional)

IMPORTANT : All input files must be in a folder named input (case sensitive).

Count files

Sample count files

You have a count file per sample, they can be in text or csv format. In this case, you will have to enter the Samples file with the path and name of the count files for each sample in a "file" column. This one can contain several columns according to the counting tools used, you will have to inform the following parameters:

parameters$col_genes column number with GeneId (default 1)
parameters$col_counts column number with count (default 7)
parameters$sep column separator (default "\t" )

Example of counts file content:

Geneid	Chr	Start	End	Strand	Length	Counts
Gene_000001	Random_Chr_001	1692	1907	-	215	0
Gene_000002	Random_Chr_001	6641	8705	-	2064	43.5
Gene_000003	Random_Chr_001	9228	9569	-	341	8
Gene_000004	Random_Chr_001	12009	13155	-	1146	781
Gene_000005	Random_Chr_001	15242	15844	+	602	16
Gene_000006	Random_Chr_001	16304	19834	+	3530	9
Gene_000007	Random_Chr_001	20595	21625	-	1030	13.83
Gene_000008	Random_Chr_001	22377	23461	-	1084	565.33
...

The corresponding parameters:

parameters$col_genes=1
parameters$col_counts=7
parameters$sep="\t"

In this example column 7 contains the counts and the gene identifiers are in the first column, it's a tabulate file so the column separator is encoded "\t".

Counts matrix file

It is also possible to have a table grouping the counts for each gene in each sample, in tabulated text format. The Samples file should not contain a file column, you will have to enter the name of your file with this parameter: parameters$fileofcount="CountsMatrix.txt".

Example of a counts matrix content:

Geneid	AC1R1	AC1R2	AC1R3	BC1R1	BC1R2	BC1R3	...
Gene_000001	0	1	0	0	0	1	...
Gene_000002	43.5	25.33	31.5	27.5	29.5	29	...
Gene_000003	8	4	5	30	16	13	...
Gene_000004	781	412	626	558	538	346	...
Gene_000005	16	7	13	9	8	6	...
Gene_000006	9	4	5	21	15	12	...
...

Samples file

This tabulated file describes the design of experiments. The first and second columns are mandatory and are named "sample" and "condition". You may have several other columns. The contents of the condition column will be the same as in the Contrast file.

The column "color" is optional, it allows to predefined the color of the sample in the graphs. If it is absent, askoR will assign colors itself.

The column "file" is mandatory if you have samples counts files. In the example below, these files are grouped in a "counts" folder. You do not need to specify the name of the "input" folder (i.e. input/counts/AC1R1_counts.txt) since, by default, it will search for it.

Don't forget to enter the name the name of your Samples file with this parameter: parameters$sample_file="Samples.txt".

Example of a Samples.txt file content:

sample	condition	genotype	treatment	color	file
AC1R1	AC1	A	C1	darkorchid2	counts/AC1R1_counts.txt
AC1R2	AC1	A	C1	darkorchid2	counts/AC1R2_counts.txt
AC1R3	AC1	A	C1	darkorchid2	counts/AC1R3_counts.txt
BC1R1	BC1	B	C1	saddlebrown	counts/BC1R1_counts.txt
BC1R2	BC1	B	C1	saddlebrown	counts/BC1R2_counts.txt
...

Contrast file

This tabulated file indicates contrasts you wish to make between your different conditions. The first column corresponds to the condition column of the Samples.txt file, then the others are columns the comparisons to be made.

The names of the comparisons should be in the form first condition (AC1) vs second condition (AC2), without space: "AC1vsAC2". Then under these columns, "AC1" will be noted "+" and "AC2" will be noted "-", the rest "0".

You will have to enter the name of your file with this parameter: parameters$contrast_file="Contrasts.txt"

Example of Contrasts.txt file content:

Condition	AC1vsAC2	AC1vsAC3	AC2vsAC3	BC1vsBC2	...
AC1	+	+	0	0	...
AC2	-	0	+	0	...
AC3	0	-	-	0	...
BC1	0	0	0	+	...
BC2	0	0	0	-	...
...

Genes annotations file

This tabulated file contains the annotations of your genes (optional). It can contain several columns but the first one must be the gene identifier and named SeqName. You will have to enter the name of your file with this parameter: parameters$annotation="annotations.txt".

Example of annotation file content:

SeqName	Description
Gene_000001	hypothetical protein pbra 009537
Gene_000002	hypothetical protein pbra 009324
Gene_000003	histone-lysine n-methyltransferase nsd2
Gene_000004	hypothetical protein pbra 009496
...

GO annotations file

This tabulated file will be WITHOUT HEADER, the first column contains the gene identifier and the second column contains all the corresponding GOs separated by a comma. This file is optional, you will have to enter the name of your file with this parameter: parameters$geneID2GO_file="GO_annotations.txt".
NOTE: This file is mandatory for GO enrichment analysis (Cf. GO enrichment Section), for clustering analysis it is not mandatory, you just won't have GO enrichment on your clusters (Cf. ClustAndGO Section).

Example of GOs annotation file:

Gene_000001	GO:0003676,GO:0015074
Gene_000002	GO:0003676,GO:0015074
Gene_000003	GO:0005488,GO:0006807,GO:0016740,GO:0043170,GO:0044238
Gene_000005	GO:0005525,GO:0005525,GO:0005525
...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly