Skip to content
Amanda Charbonneau edited this page Feb 27, 2021 · 41 revisions

This wiki is a companion to the detailed technical documentation for participating in the Common Fund Data Ecosystem Portal. Use the sidebar to search for key words, error messages and more, or get started with our QuickStart below.

What is the C2M2?

The Crosscut Metadata Model (C2M2), a flexible metadata standard for describing experimental resources in biomedicine and related fields. At the Common Fund Data Ecosystem (CFDE) we use the C2M2 as our centralized model of participating datasets in a rich relational database hosted at https://app.nih-cfde.org/. This portal supports faceted search of metadata concepts such as anatomical location, species, and assay type, across a wide variety of datasets using a controlled vocabulary. This allows researchers to find a wide variety of data that would otherwise need to be searched individually, using varying nomenclatures. Currently, the portal only accepts C2M2 datapackages from Common Fund Programs. If you represent the Data Coordination Center from a Common Fund Program, and would like to know more about joining the Common Fund Data Ecosystem please contact us by emailing [email protected]. Funding is available for Common Fund Programs who wish to participate, see Engagement Opportunities for Common Fund Programs for more information.

QuickStart: Participating in the CFDE Portal

Joining the CFDE

The Data Coordination Center (DCC) for each participating Common Fund Program needs to onboard with the CFDE-CC before we can accept submissions. You do not need to be funded by a CFDE award to participate, however awards are available (see Engagement Opportunities for Common Fund Programs for more information). To begin your onboarding, please email the helpdesk.

Onboarding to the CFDE portal

Creating your datapackage

A datapackage consists of 22 tab separated value (.tsv) files populated with interrelated metadata about the data assets owned by your DCC. Assuming you fill all of the tables, a datapackage submission will make your data searchable by concepts such as anatomical location, species, assay type, and other similar terms that are useful to researchers who are looking for new datasets. This datapackage can be created at several arbitrary levels of complexity, as many of the columns and several entire tables can be left empty and still produce a valid package. However, search-ability in the CFDE portal is highly correlated with model completeness, and as such the Coordination Center recommends making your datapackage as complete as possible. The full specification for all tables is available in the technical documentation. See the C2M2-Table-Summary for an abbreviated description of just the tables.

Installing the tools

Helper script

cfde-submit

To submit your data you will need to install the cfde-submit tool

To avoid potential conflicts, we recommended installing cfde-submit from within a Python 3 virtual environment (more info)

To install the tool:

pip3 install cfde-submit

Full documentation is available here: https://github.com/nih-cfde/cfde-submit/blob/main/docs/index.md

OPTIONAL: frictionless

pip install frictionless-py frictionless validate data/datapackage.json

Checking and approving your submission

Clone this wiki locally