Skip to content

UserGuide: Files Descriptions

S.Alves edited this page Sep 2, 2021 · 2 revisions

Input files description

You'll find a test set in the inst/extdata/input folder. It'll be used for the rest of the documentation. Contents of input folder:

  • Count matrix file: CountsMatrix.txt OR Counts files per samples: files in "counts" directory (counts.zip)
  • Samples file: Samples_CountsMatrix.txt OR Samples_CountsFiles.txt
  • Contrasts file: Contrasts.txt
  • Genes annotations file: Genes_annotations.txt (optional)
  • GO annotations file: GO_annotations.txt (optional)

IMPORTANT : All input files must be in a folder named input (case sensitive).

Count files

Sample count files

You have a count file per sample, they can be in text or csv format. In this case, you will have to enter the Samples file with the path and name of the count files for each sample in a "file" column. This one can contain several columns according to the counting tools used, you will have to inform the following parameters:

  • parameters$col_genes   column number with GeneId (default 1)
  • parameters$col_counts   column number with count (default 7)
  • parameters$sep   column separator (default "\t" )

Example of counts file content:

Geneid Chr Start End Strand Length Counts
Gene_000001 Random_Chr_001 1692 1907 - 215 0
Gene_000002 Random_Chr_001 6641 8705 - 2064 43.5
Gene_000003 Random_Chr_001 9228 9569 - 341 8
Gene_000004 Random_Chr_001 12009 13155 - 1146 781
Gene_000005 Random_Chr_001 15242 15844 + 602 16
Gene_000006 Random_Chr_001 16304 19834 + 3530 9
Gene_000007 Random_Chr_001 20595 21625 - 1030 13.83
Gene_000008 Random_Chr_001 22377 23461 - 1084 565.33
...

The corresponding parameters:

parameters$col_genes=1
parameters$col_counts=7
parameters$sep="\t"

In this example column 7 contains the counts and the gene identifiers are in the first column, it's a tabulate file so the column separator is encoded "\t".

Counts matrix file

It is also possible to have a table grouping the counts for each gene in each sample, in tabulated text format. The Samples file should not contain a file column, you will have to enter the name of your file with this parameter: parameters$fileofcount="CountsMatrix.txt".

Example of a counts matrix content:

Geneid AC1R1 AC1R2 AC1R3 BC1R1 BC1R2 BC1R3 ...
Gene_000001 0 1 0 0 0 1 ...
Gene_000002 43.5 25.33 31.5 27.5 29.5 29 ...
Gene_000003 8 4 5 30 16 13 ...
Gene_000004 781 412 626 558 538 346 ...
Gene_000005 16 7 13 9 8 6 ...
Gene_000006 9 4 5 21 15 12 ...
...

Samples file

This tabulated file describes the design of experiments. The first and second columns are mandatory and are named "sample" and "condition". You may have several other columns. The contents of the condition column will be the same as in the Contrast file.

The column "color" is optional, it allows to predefined the color of the sample in the graphs. If it is absent, askoR will assign colors itself.

The column "file" is mandatory if you have samples counts files. In the example below, these files are grouped in a "counts" folder. You do not need to specify the name of the "input" folder (i.e. input/counts/AC1R1_counts.txt) since, by default, it will search for it.

Don't forget to enter the name the name of your Samples file with this parameter: parameters$sample_file="Samples.txt".

Example of a Samples.txt file content:

sample condition genotype treatment color file
AC1R1 AC1 A C1 darkorchid2 counts/AC1R1_counts.txt
AC1R2 AC1 A C1 darkorchid2 counts/AC1R2_counts.txt
AC1R3 AC1 A C1 darkorchid2 counts/AC1R3_counts.txt
BC1R1 BC1 B C1 saddlebrown counts/BC1R1_counts.txt
BC1R2 BC1 B C1 saddlebrown counts/BC1R2_counts.txt
...

Contrast file

This tabulated file indicates contrasts you wish to make between your different conditions. The first column corresponds to the condition column of the Samples.txt file, then the others are columns the comparisons to be made.

The names of the comparisons should be in the form first condition (AC1) vs second condition (AC2), without space: "AC1vsAC2". Then under these columns, "AC1" will be noted "+" and "AC2" will be noted "-", the rest "0".

You will have to enter the name of your file with this parameter: parameters$contrast_file="Contrasts.txt"

Example of Contrasts.txt file content:

Condition AC1vsAC2 AC1vsAC3 AC2vsAC3 BC1vsBC2 ...
AC1 + + 0 0 ...
AC2 - 0 + 0 ...
AC3 0 - - 0 ...
BC1 0 0 0 + ...
BC2 0 0 0 - ...
...

Genes annotations file

This tabulated file contains the annotations of your genes (optional). It can contain several columns but the first one must be the gene identifier and named SeqName. You will have to enter the name of your file with this parameter: parameters$annotation="annotations.txt".

Example of annotation file content:

SeqName Description
Gene_000001 hypothetical protein pbra 009537
Gene_000002 hypothetical protein pbra 009324
Gene_000003 histone-lysine n-methyltransferase nsd2
Gene_000004 hypothetical protein pbra 009496
...

GO annotations file

This tabulated file will be WITHOUT HEADER, the first column contains the gene identifier and the second column contains all the corresponding GOs separated by a comma. This file is optional, you will have to enter the name of your file with this parameter: parameters$geneID2GO_file="GO_annotations.txt".
NOTE: This file is mandatory for GO enrichment analysis (Cf. GO enrichment Section), for clustering analysis it is not mandatory, you just won't have GO enrichment on your clusters (Cf. ClustAndGO Section).

Example of GOs annotation file:

Gene_000001     GO:0003676,GO:0015074
Gene_000002     GO:0003676,GO:0015074
Gene_000003     GO:0005488,GO:0006807,GO:0016740,GO:0043170,GO:0044238
Gene_000005     GO:0005525,GO:0005525,GO:0005525
...