Generation type

The generation type used in the file configuration defines how a file is processed and how duplicated information is handled.

glygen_protein_data

The file is expected to be a CSV file with protein centric information. The metadata information in the file configuration has to be provided as:

protein - mandatory
gene - mandatory
glycan - optional
disease - optional
anatomy - optional
species - mandatory

Missing one of the mandatory columns will result into an error and stop of the program. If multiple rows are present for the same protein all metadata (e.g. glycan, disease, anatomy) will be added to the same protein entry which will create a single collection object in the CFDE data structure. Duplicated values for glycan, disease or anatomy will be ignored.

glygen_protein_no_gene_data

The file is expected to be a CSV file with protein centric information. However, in difference to glygen_protein_data its expected that the proteins have no gene information (e.g., viruses). The metadata information in the file configuration has to be provided as:

protein - mandatory
gene - no
glycan - optional
disease - optional
anatomy - optional
species - mandatory

Missing one of the mandatory columns will result into an error and stop of the program. If multiple rows are present for the same protein all metadata (e.g. glycan, disease, anatomy) will be added to the same protein entry which will create a single collection object in the CFDE data structure. Duplicated values for glycan, disease or anatomy will be ignored.

glygen_glycan_data

The file is expected to be a CSV file with glycan centric information. The metadata information in the file configuration has to be provided as:

protein - optional (if provided gene information has to be provided as well)
gene - optional (if provided protein information has to be provided as well)
glycan - mandatory
disease - optional
anatomy - optional
species - optional

Missing the glycan column will result into an error and stop of the program. If multiple rows are present for the same glycan all metadata (e.g. protein, gene, disease, species, anatomy) will be added to the same glycan entry which will create a single collection object in the CFDE data structure. Duplicated values for protein, species, disease or anatomy will be ignored.

glygen_protein_glycan_mix_data

The file is expected to be a CSV file which can contain both, protein centric information or glycan centric information. The distinction between these two cases is made based on the presence of the protein. The metadata information in the file configuration has to be provided as:

protein - mandatory
gene - mandatory
glycan - mandatory
disease - optional
anatomy - optional
species - mandatory

Missing one of the mandatory columns will result into an error and stop of the program. For rows with protein IDs if multiple rows are present for the same protein all metadata (e.g. glycan, disease, anatomy) will be added to the same protein entry which will create a single collection object in the CFDE data structure. Duplicated values for glycan, disease or anatomy will be ignored. For rows with no protein ID if multiple rows are present for the same glycan all metadata (e.g. protein, gene, disease, species, anatomy) will be added to the same glycan entry which will create a single collection object in the CFDE data structure. Duplicated values for species, disease or anatomy will be ignored.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generation type

glygen_protein_data

glygen_protein_no_gene_data

glygen_glycan_data

glygen_protein_glycan_mix_data

Clone this wiki locally