Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to validate target data #197

Open
annakrystalli opened this issue Nov 20, 2024 · 1 comment
Open

Add function to validate target data #197

annakrystalli opened this issue Nov 20, 2024 · 1 comment

Comments

@annakrystalli
Copy link
Member

No description provided.

@zkamvar
Copy link
Member

zkamvar commented Nov 20, 2024

My opinion (which is widely considered trash, but without trash we would not appreciate the beauty of the natural world) is that we should first come up with a standard for the target data before writing any functionality. Importantly, it should follow these guiding principles:

  1. well-defined: the time series data should have clear mappings to the task IDs in the hub (this is implicit for oracle data since it is derived from time series and should match the model output)
  2. clear: these data should be easy for a hub administrator to write and store
  3. general: someone without access to hubverse tools should still be able to read in these data (both time series and oracle outputs) and operate on them without requiring them to write code that is specific to one particular hub
  4. format-agnostic: in line with model output data

For item 1, I'm specifically thinking about how we frame guidelines for hub administrators who are not very comfortable working with GitHub or code.

For item 2, I'm thinking about tooling that would need to be written in something like https://github.com/hubverse-org/hub-dashboard-predtimechart to generate data for predtimechart or another visualization.

I discussed potential solutions for this briefly in hubverse-org/hubDocs#208 (comment):

there still needs to be a bit more structure to having something that we can use to consistently read in the target data. I can see a few ways of addressing this:

  1. mandate specific column and file names
  2. add configurations to the admin schema that defines the path to the time series and oracle output data, mapping column names to targets.
  3. the same as 2, but having a specific targets.json spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants