File a New Issue with any questions, suggestions, comments, or feedback.
The Common Fund Data Ecosystem (CFDE) is an effort to bring together knowledge across Common Fund programs into a cohesive resource. The Integration and Coordination Center (ICC) is a center within the CFDE responsible for, among other things, reporting on the impact and influence of the CFDE.
This repository is a place to:
- View a high-level dashboard of Common Fund and CFDE activities (e.g. projects over time, dollars awarded, # of publications produced, etc.).
- Ensure your project is fully included in the dashboard by aligning with our ingest process.
- Maintain code used to coordinate the above.
You can view the information as a live dashboard webapp or as separate PDF reports.
Current maintainers and team members:
- Casey Greene - Evaluation Core Lead
- Sean Davis - Evaluation Core Lead
- Vincent Rubinetti (@vincerubinetti) - Software developer
We gather most details about Common Fund projects automatically from NIH systems, but there are some pieces of info that require manual actions to be integrated. If you would like your project to be included in the dashboard to the fullest extent, please follow the instructions in the sections below as applicable. We've tried to make this process as easy and automated as possible.
Once you've made a submission (and once your project is in NIH systems), your project should appear here the next time our ingest process runs. We try to run the ingest process regularly and frequently, but if you'd like your project to show up faster or are otherwise having issues, please contact us.
Repositories ("repos") are places for storing, tracking changes to, and collaborating on software.
Currently, we only take submissions of software kept in public GitHub.com repos. Private repos and other platforms such as GitLab aren't supported yet.
- Find all GitHub repos that are associated with your project.
- If you're unsure where to find these, ask members of your project about any software that was written in support of it, and where the code for the software resides.
- "Tag" each repo with the project.
- See GitHub's instructions for tagging repos for reference.
- On the main page of the repo, click on the gear ⚙ next to "About".
- Under "Topics", type in your core project number†, e.g.
U54OD036472
(case-insensitive).
This repository itself has been tagged with its core project number, so use it as a reference.
† Do not confuse this with a (sub) "project" number, which is longer, e.g. 1U54OD036472-01
.
Analytics are services that monitor traffic (number of visits over time, number of unique visitors, visitor demographics, etc.) on publicly accessible webpages.
Currently, we only take submissions of webpages using Google Analytics. Other analytics services may supported in the future.
- Find all Google Analytics properties that are associated with your project.
- If you're unsure where to find these, ask members of your project about any webpages related to it, and if anyone set up analytics for them.
- Allow us access to each property.
- Go to the Google Analytics dashboard and make sure you are on the right property.
- Find "Property Access Management" from the main search bar (or the "Admin" side menu).
- Add a new user with the email
[email protected]
, uncheck "Notify by email", and select the "Viewer" role.
- "Tag" each property so we can associate it with a particular project.
- Find "Key Events" from the "Admin" side menu,under "Data Display".
- Create a new key event with the name
cfde_XXX
(case-insensitive), where XXX is the core project number, e.g.cfde_R03OD034502
.
- Linux or MacOS system
- Node v22+
- Bun (for package management only, as faster/smaller replacement for Yarn)
The automated steps in this repo are roughly as follows:
- Ingest
- Get raw data from an external resource, either by scraping an HTML page, downloading and parsing a PDF or CSV, or making a request to an API.
- Save raw data exactly as-is for provenance and caching.
- Collate most important information from raw data into common high-level output data format suited to making desired dashboard pages and PDF reports.
- Repeat previous steps in order of dependency (e.g. opportunity number -> grant numbers) until all needed info is gathered.
- Print
- Run dashboard webapp.
- Import output data from ingest, and do some minimal final processing (e.g. combine journal info with each publication listing).
- Render select dashboard pages (e.g.
/core-project/abc123
) to PDF reports.
- Deploy dashboard and PDFs to live, public web addresses.
/app
- Dashboard webapp made with Vue. Also used for generating PDF reports./public/pdfs
- Outputted PDF reports.
/data
- All other functionality involving data (e.g. ingesting/collating/etc)./api
- Types and functions for getting raw data from external APIs./raw
- Raw data gathered from external sources. Primarily for provenance, but also acts as ingest cache (delete files to re-fetch from external providers)./ingest
- Functions for scraping webpages and calling APIs, and collating that data into a common format./output
- Collated data in format for making desired reports./print
- Functions specific to making printed reports./util
- Small-scope general purpose functions.
- TypeScript - Language used to provide type-safety from beginning to end of pipeline.
- Playwright - Tool used for scraping public web pages and rendering dashboard pages to PDF reports.
- Netlify - Service used for hosting dashboard webapp (and PR previews).
- Octokit - Library used for conveniently interacting with GitHub APIs.
The pipeline is optimized wherever possible and appropriate. Things like network requests and rendering are parallelized (e.g. PDF reports are printed simultaneously in separate tabs of the same Playwright browser instance). External resources are cached in their raw format to speed up subsequent runs, and to avoid being rate-limited or blocked by those providers.
Use ./run.sh
with a --flag
to conveniently run a script
of the same name in /data/package.json
and /app/package.json
(if it exists) from the root of this repo.
Most important scripts:
Flag | Description |
---|---|
--install |
Install packages and dependencies |
--install-playwright |
Install Playwright |
no flag | Run main pipeline steps in order |
--test |
Run all tests (type-checking, linting/formatting checks, etc.) |
--lint |
Auto-fix linting/formatting |
--dev |
Run dashboard webapp in dev mode |
See readmes in sub folders for all commands.
CACHE
- Whether to use cached files in/raw
to skip time-consuming network requests. Set totrue
(or any string) for true. Leave blank/unset for false.false
by default.
- Integration of events (collaboration with Training and Outreach Center)
- Notebook-based reporting
- Data asset catalogs (collaboration with DRC)
- User engagement metrics (collaboration with Cloud Center)
- Integrate with Common Fund Program metrics (integration with SPECS/NIH)
- Evaluation and Impact Working Group
- Bibliometrics (including PMC mining for CFDE assets)