-
Notifications
You must be signed in to change notification settings - Fork 1
Generation type
The generation type used in the file configuration defines how a file is processed and how duplicated information is handled.
The file is expected to be a CSV file with protein centric information. The metadata information in the file configuration has to be provided as:
- protein - mandatory
- gene - mandatory
- glycan - optional
- disease - optional
- anatomy - optional
- species - mandatory
Missing one of the mandatory columns will result into an error and stop of the program. If multiple rows are present for the same protein all metadata (e.g. glycan, disease, anatomy) will be added to the same protein entry which will create a single collection object in the CFDE data structure. Duplicated values for glycan, disease or anatomy will be ignored.
The file is expected to be a CSV file with protein centric information. However, in difference to glygen_protein_data its expected that the proteins have no gene information (e.g., viruses). The metadata information in the file configuration has to be provided as:
- protein - mandatory
- gene - no
- glycan - optional
- disease - optional
- anatomy - optional
- species - mandatory
Missing one of the mandatory columns will result into an error and stop of the program. If multiple rows are present for the same protein all metadata (e.g. glycan, disease, anatomy) will be added to the same protein entry which will create a single collection object in the CFDE data structure. Duplicated values for glycan, disease or anatomy will be ignored.
The file is expected to be a CSV file with glycan centric information. The metadata information in the file configuration has to be provided as:
- protein - optional (if provided gene information has to be provided as well)
- gene - optional (if provided protein information has to be provided as well)
- glycan - mandatory
- disease - optional
- anatomy - optional
- species - optional
Missing the glycan column will result into an error and stop of the program. If multiple rows are present for the same glycan all metadata (e.g. protein, gene, disease, species, anatomy) will be added to the same glycan entry which will create a single collection object in the CFDE data structure. Duplicated values for protein, species, disease or anatomy will be ignored.
The file is expected to be a CSV file which can contain both, protein centric information or glycan centric information. The distinction between these two cases is made based on the presence of the protein. The metadata information in the file configuration has to be provided as:
- protein - mandatory
- gene - mandatory
- glycan - mandatory
- disease - optional
- anatomy - optional
- species - mandatory
Missing one of the mandatory columns will result into an error and stop of the program. For rows with protein IDs if multiple rows are present for the same protein all metadata (e.g. glycan, disease, anatomy) will be added to the same protein entry which will create a single collection object in the CFDE data structure. Duplicated values for glycan, disease or anatomy will be ignored. For rows with no protein ID if multiple rows are present for the same glycan all metadata (e.g. protein, gene, disease, species, anatomy) will be added to the same glycan entry which will create a single collection object in the CFDE data structure. Duplicated values for species, disease or anatomy will be ignored.