Revisions to resurrect and improve metadata-report (metadata-report v2) #70

anngvu · 2023-07-14T21:09:05Z

Update

TLDR, the new report will focus on quality per dataset per project. However, some summary per-project status can be distilled to be tracked in the studies table in the NTAP-requested annotationStatus. This could be a translation of:

0/3 datasets validated = "Needs attention"
2/3 datasets validated = "?"
3/3 datasets validated = "All good"

(Might be more informative to just keep the ratios, but on the other hand labels are easier to understand quickly.)

Open for comments. @jaybee84 @allaway @cconrad8

Follow-up on #64. The old codebase has been consolidated into this repo, and in the process reviewed for what renovations are needed for improved utility.

Screenshot of old report:

Ideas/use cases covered in old report:

Examine annotation properties being used to refine/formalize in data model at the time
Examine values across these properties to refine data model at the time
Examine projects with questionable key/values (that shouldn't be there or should be revised) to help clean up projects at the time

Some limitations:

Results can be better presented/summarized
Data pulled from the fileviews specific to that project. However, fileviews don't necessarily surface all metadata

Changes to implement:

Don't just use local fileviews, pull all annotations on files directly
The new crawler would be more intensive, consider implementing with something with better parallelization than R
Assess in context of a template/data type rather than only key/value
Report with more project-centric view vs old report, which is more data-model centric
Better summary of annotation completeness and correctness (related to Add script to provide basic scoring of annotations nfportalutils#87)
Write Dockerfile to run this on scheduled jobs or anywhere containers can be used instead of (formerly) based on GH infra

Mockup for a project with 2 datasets:

Project ID	Datasets	Matched template	Validation results for template	Extra annotations outside of template
syn123	RNA-seq dataset	GenomicsAssayTemplate	missing x
syn123	IHC dataset	ImagingAssayTemplate	OK	experimenterName

The text was updated successfully, but these errors were encountered:

anngvu added this to NF-OSI Sprints Jul 14, 2023

anngvu self-assigned this Jul 18, 2023

anngvu moved this to In Progress in NF-OSI Sprints Jul 18, 2023

anngvu changed the title ~~Revisions to improve metadata-report~~ Revisions to resurrect and improve metadata-report (metadata-report v2) Sep 27, 2023

anngvu moved this from In Progress to Todo in NF-OSI Sprints Sep 27, 2023

anngvu mentioned this issue Nov 8, 2023

Monitoring for dataStatus changes from "Data Pending" to "Under Embargo" #75

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisions to resurrect and improve metadata-report (metadata-report v2) #70

Revisions to resurrect and improve metadata-report (metadata-report v2) #70

anngvu commented Jul 14, 2023 •

edited

Loading

Revisions to resurrect and improve metadata-report (metadata-report v2) #70

Revisions to resurrect and improve metadata-report (metadata-report v2) #70

Comments

anngvu commented Jul 14, 2023 • edited Loading

anngvu commented Jul 14, 2023 •

edited

Loading