-
Notifications
You must be signed in to change notification settings - Fork 7
Home
This wiki is a companion to the detailed technical documentation for participating in the Common Fund Data Ecosystem Portal. Use the sidebar to search for key words, error messages and more, or get started with our QuickStart below.
The Crosscut Metadata Model (C2M2), a flexible metadata standard for describing experimental resources in biomedicine and related fields. At the Common Fund Data Ecosystem (CFDE) we use the C2M2 as our centralized model of participating datasets in a rich relational database hosted at https://app.nih-cfde.org/. This portal supports faceted search of metadata concepts such as anatomical location, species, and assay type, across a wide variety of datasets using a controlled vocabulary. This allows researchers to find a wide variety of data that would otherwise need to be searched individually, using varying nomenclatures. Currently, the portal only accepts C2M2 datapackages from Common Fund Programs. If you represent the Data Coordination Center from a Common Fund Program, and would like to know more about joining the Common Fund Data Ecosystem please contact us by emailing [email protected]. Funding is available for Common Fund Programs who wish to participate, see Engagement Opportunities for Common Fund Programs for more information.
The Data Coordination Center (DCC) for each participating Common Fund Program needs to onboard with the CFDE-CC before we can accept submissions. You do not need to be funded by a CFDE award to participate, however awards are available (see Engagement Opportunities for Common Fund Programs for more information). To begin your onboarding, please email the helpdesk.
A datapackage consists of 22 tab separated value (.tsv) files populated with interrelated metadata about the data assets owned by your DCC. Assuming you fill all of the tables, a datapackage submission will make your data searchable by concepts such as anatomical location, species, assay type, and other similar terms that are useful to researchers who are looking for new datasets. This datapackage can be created at several arbitrary levels of complexity, as many of the columns and several entire tables can be left empty and still produce a valid package. However, search-ability in the CFDE portal is highly correlated with model completeness, and as such the Coordination Center recommends making your datapackage as complete as possible. The full specification for all tables is available in the technical documentation. See the C2M2-Table-Summary for an abbreviated description of just the tables.
To submit your data you will need to install the cfde-submit
tool
To avoid potential conflicts, we recommended installing cfde-submit from within a Python 3 virtual environment (more info)
To install the tool:
pip3 install cfde-submit
Full documentation is available here: https://github.com/nih-cfde/cfde-submit/blob/main/docs/index.md
pip install frictionless-py
frictionless validate data/datapackage.json
-
Tutorials
-
C2M2 Table Guide
-
Table Summary
- analysis_type.tsv
- anatomy.tsv
- assay_type.tsv
- biosample.tsv
- biosample_disease.tsv
- biosample_from_subject.tsv
- biosample_gene.tsv
- biosample_in_collection.tsv
- biosample_substance.tsv
- collection.tsv
- collection_anatomy.tsv
- collection_compound.tsv
- collection_defined_by_project.tsv
- collection_disease.tsv
- collection_gene.tsv
- collection_in_collection.tsv
- collection_phenotype.tsv
- collection_protein.tsv
- collection_substance.tsv
- collection_taxonomy.tsv
- compound.tsv
- data_type.tsv
- dcc.tsv (formerly
primary_dcc_contact.tsv
- disease.tsv
- file.tsv
- file_describes_biosample.tsv
- file_describes_collection.tsv
- file_describes_subject.tsv
- file_format.tsv
- file_in_collection.tsv
- gene.tsv
- id_namespace.tsv
- ncbi_taxonomy.tsv
- phenotype.tsv
- phenotype_disease.tsv
- phenotype_gene.tsv
- project.tsv
- project_in_project.tsv
- protein.tsv
- protein_gene.tsv
- subject.tsv
- subject_disease.tsv
- subject_in_collection.tsv
- subject_phenotype.tsv
- subject_race.tsv
- subject_role_taxonomy.tsv
- subject_substance.tsv
- substance.tsv
- Reference Tables
-
Table Summary