-
Notifications
You must be signed in to change notification settings - Fork 7
Home
This wiki is a companion to the detailed technical documentation for participating in the Common Fund Data Ecosystem Portal.
- Use the sidebar to navigate wiki pages
- Ask questions or search common errors in Discussions
- Get started on your submission with our QuickStart
The Crosscut Metadata Model (C2M2), a flexible metadata standard for describing experimental resources in biomedicine and related fields. At the Common Fund Data Ecosystem (CFDE) we use the C2M2 as our centralized model of participating datasets in a rich relational database hosted at https://app.nih-cfde.org/. This portal supports faceted search of metadata concepts such as anatomical location, species, and assay type, across a wide variety of datasets using a controlled vocabulary. This allows researchers to find a wide variety of data that would otherwise need to be searched individually, using varying nomenclatures. Currently, the portal only accepts C2M2 datapackages from Common Fund Programs. If you represent the Data Coordination Center from a Common Fund Program, and would like to know more about joining the Common Fund Data Ecosystem please contact us by emailing [email protected]. Funding is available for Common Fund Programs who wish to participate, see Engagement Opportunities for Common Fund Programs for more information.
Graphic overview. White boxes are user steps, blue boxes are automated:
DCCs submit data packages to the CFDE Data Submission System using the cfde-submit
tool. This tool takes a directory as input, does some initial validation, then builds the directory into a bdbag and submits the data to an authenticated Globus endpoint. This process should take less than 30 seconds on your local computer, and the tool will report Your dataset has been submitted
.
Once your data is in the in the Globus endpoint, our database (Deriva) will automatically begin ingesting the datapackage, and doing further validation. This process will take several minutes, but is done completely on our servers, so you don't need to stay connected. However, you can check the status of your ingest using cfde-submit status
. When Deriva finishes your ingest, you will receive an email that contains information about the datapackage, including a link to view the data in the CFDE data portal. You can also navigate directly to the Submission system by logging in at https://app.nih-cfde.org/ and clicking 'Data Review'. DCCs can have any number of submitted datapackages in the system, and can use the portal to view each submission in multiple ways and ensure it is structured as intended. No DCC submissions will be viewable or searchable by the public until they are approved for inclusion in the public release. Although DCCs can have any number of Reviewable Submissions, each can have only a single Approved Submission for public release. If a new submission is approved before a the next public release, the newly approved submission will completely replace the previously approved submission in the pending Release catalog. At each public release date, all approved datapackages will be rolled into the public catalog and will become searchable in the portal. If a DCC does not submit a new datapackage between releases, their current public datapackage will stay in the portal. If a DCC has submitted a new datapackage, it will completely replace any previous data that was available. We do not have the ability to accept updates to existing submissions at this time.
-
Tutorials
-
C2M2 Table Guide
-
Table Summary
- analysis_type.tsv
- anatomy.tsv
- assay_type.tsv
- biosample.tsv
- biosample_disease.tsv
- biosample_from_subject.tsv
- biosample_gene.tsv
- biosample_in_collection.tsv
- biosample_substance.tsv
- collection.tsv
- collection_anatomy.tsv
- collection_compound.tsv
- collection_defined_by_project.tsv
- collection_disease.tsv
- collection_gene.tsv
- collection_in_collection.tsv
- collection_phenotype.tsv
- collection_protein.tsv
- collection_substance.tsv
- collection_taxonomy.tsv
- compound.tsv
- data_type.tsv
- dcc.tsv (formerly
primary_dcc_contact.tsv
- disease.tsv
- file.tsv
- file_describes_biosample.tsv
- file_describes_collection.tsv
- file_describes_subject.tsv
- file_format.tsv
- file_in_collection.tsv
- gene.tsv
- id_namespace.tsv
- ncbi_taxonomy.tsv
- phenotype.tsv
- phenotype_disease.tsv
- phenotype_gene.tsv
- project.tsv
- project_in_project.tsv
- protein.tsv
- protein_gene.tsv
- subject.tsv
- subject_disease.tsv
- subject_in_collection.tsv
- subject_phenotype.tsv
- subject_race.tsv
- subject_role_taxonomy.tsv
- subject_substance.tsv
- substance.tsv
- Reference Tables
-
Table Summary