Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisions to resurrect and improve metadata-report (metadata-report v2) #70

Open
anngvu opened this issue Jul 14, 2023 · 0 comments
Open
Assignees

Comments

@anngvu
Copy link
Contributor

anngvu commented Jul 14, 2023

Update

TLDR, the new report will focus on quality per dataset per project. However, some summary per-project status can be distilled to be tracked in the studies table in the NTAP-requested annotationStatus. This could be a translation of:

  • 0/3 datasets validated = "Needs attention"
  • 2/3 datasets validated = "?"
  • 3/3 datasets validated = "All good"

(Might be more informative to just keep the ratios, but on the other hand labels are easier to understand quickly.)

Open for comments. @jaybee84 @allaway @cconrad8


Follow-up on #64. The old codebase has been consolidated into this repo, and in the process reviewed for what renovations are needed for improved utility.

Screenshot of old report:
image

Ideas/use cases covered in old report:

  • Examine annotation properties being used to refine/formalize in data model at the time
  • Examine values across these properties to refine data model at the time
  • Examine projects with questionable key/values (that shouldn't be there or should be revised) to help clean up projects at the time

Some limitations:

  • Results can be better presented/summarized
  • Data pulled from the fileviews specific to that project. However, fileviews don't necessarily surface all metadata

Changes to implement:

  • Don't just use local fileviews, pull all annotations on files directly
  • The new crawler would be more intensive, consider implementing with something with better parallelization than R
  • Assess in context of a template/data type rather than only key/value
  • Report with more project-centric view vs old report, which is more data-model centric
  • Better summary of annotation completeness and correctness (related to Add script to provide basic scoring of annotations nfportalutils#87)
  • Write Dockerfile to run this on scheduled jobs or anywhere containers can be used instead of (formerly) based on GH infra

Mockup for a project with 2 datasets:

Project ID Datasets Matched template Validation results for template Extra annotations outside of template
syn123 RNA-seq dataset GenomicsAssayTemplate missing x
syn123 IHC dataset ImagingAssayTemplate OK experimenterName
@anngvu anngvu self-assigned this Jul 18, 2023
@anngvu anngvu moved this to In Progress in NF-OSI Sprints Jul 18, 2023
@anngvu anngvu changed the title Revisions to improve metadata-report Revisions to resurrect and improve metadata-report (metadata-report v2) Sep 27, 2023
@anngvu anngvu moved this from In Progress to Todo in NF-OSI Sprints Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant