-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding schema for Source Catalog #374
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #374 +/- ##
=======================================
Coverage 95.38% 95.38%
=======================================
Files 4 4
Lines 195 195
=======================================
Hits 186 186
Misses 9 9 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial comments
datamodel_name: SourceCatalogModel | ||
archive_meta: None | ||
type: object | ||
properties: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General questions: Since this is a data model should it have a meta
keyword? If so what metadata should be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know yet – I'll follow up when I get directions from RTB.
Re possibly including metadata, when I make fits catalogs for other surveys, what I do is include the image header as the FITS header for the catalog. You could also imagine including the step arguments so as to record some information about how the catalog was created. For FITS the headers are usually pretty simple, but the asdf meta area can be more extensive so it's less clear to me that this makes as much sense there. It's also nice if the files are easy to open; I'm not sure what number of dependencies one needs to bring in to open a file that contains one of our image headers. |
In our last meeting, @schlafly checked if there was any downside to copying (parts of?) metadata from the L3 image used to produce the source catalog. @WilliamJamieson confirmed that ASDF would not complain if for example, GWCS was not installed, when loading a source catalog containing a copy of the metadata from its L3 image. So it sounds like we're likely to include some kind of metadata, though we don't know which yet. |
446d7f8
to
0df5c06
Compare
|
||
title: Source catalog generated by the Source Catalog Step. | ||
datamodel_name: SourceCatalogModel | ||
archive_meta: None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
archive_meta is used to indicate this is the entry point for storing data in the db (e.g. wfi_image-1.0.0.yaml, guidewindow-1.0.0.yaml, etc.) Is this such a file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The honest answer is: I don't know. I'm waiting on instructions from RTB on these specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kdupriestsci The source catalogs and the segmentation maps are officially a data product that needs to be stored in the archive. They are new products we are developing now but "at the level of wfi_image". Does this answer your question? What would be the value of this keyword then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nden I'm not sure what you mean by "at the level of wfi_image". If they are their own files and not part of the wfi_image file then having the archive_meta: None is correct (we are not yet assigning a value to this keyword yet, we're just checking for its existence.
Please note that we don't have filetypes for Segmentation Maps or Source Catalogs in the archive (unless they're going under a different name), so this will require coordination with the db (either Azure or Oak, depending on who's doing it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kdupriestsci Sorry for the bad terminology. By "at the same level as wfi_image" I meant that these will be separate files. And thanks for clarifying the correct value in this case is None
.
@ddavis-stsci @schlafly Could you coordinate with the rest of the team the work on the filetypes in the DB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should bring this up in the RAD group but I think the correct filetype will be "Science WFI Level 4", FileTypeID=64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote what I think the initial metadata for these two schemas should be.
We can change/improve the metadata when we have more information.
exact_datatype: true | ||
meta: | ||
anyOf: | ||
- $ref: common-1.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need all of common
here. In the absence of direction what to do I will list what I think we need to add to the metadata.
- remove
common
- type: object | ||
properties: | ||
filename: | ||
type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under properties add
basic
coordinates
wfi_optical_element
program
visit
(if a visit level catalog)
remove filename
as it is included in basic
|
||
title: Source catalog generated by the Source Catalog Step. | ||
datamodel_name: SourceCatalogModel | ||
archive_meta: None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kdupriestsci The source catalogs and the segmentation maps are officially a data product that needs to be stored in the archive. They are new products we are developing now but "at the level of wfi_image". Does this answer your question? What would be the value of this keyword then?
- type: object | ||
properties: | ||
segmentation_map: | ||
type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should follow what the segmentation_map
schema includes.
I think the easiest approach here is to literally duplicate the corresponding image metadata into the catalog metadata. i.e., catalog.meta = image.meta. This is what we do, e.g., for unWISE and DECaPS (albeit with FITS files). I think the only danger here is when the metadata gets complicated. e.g., right now in the romancal L2 files we store tweakreg_catalog in the metadata. That obviously gets circular and bad; I think we should instead store that outside the metadata. This also means that there will be things like stnode.Filename and gWCS objects in the metadata. If that makes the asdf files less accessible to general users that would be a problem. But conceptually this is duplicating metadata that we have decided is important to understand the image and so it applies equally well to understanding the catalog. Especially for the L3s where the amount of metadata is presently modest this makes a lot of sense to me. For the L2s where the metadata currently contains a lot of cruft, we'll be copying that cruft over, but I'd propose we think about pruning that cruft from the L2s rather than pruning it when selecting a subset of metadata to copy from the L2s. |
Of course this is the schema for the files so you can't really assign catalog.meta = image.meta but need to define them. Might as well figure out which parts make sense to be included. |
58034a8
to
a3771e1
Compare
I'm not good at reading rad schemas---I think we should try to follow the image metadata structure if we do this, and I think I read this schema as saying e.g. that the engineering_quality will be accessible as meta.engineering_quality. Meanwhile for the images it'sat meta.visit.engineering_quality. For L2 catalogs, the most important metadata to bring over is the photometry table that does the unit conversions; let's bring that as well. But for L3 catalogs we don't need that. Optional? The next most important thing is the WCS in my opinion, so let's copy that too. Then it becomes deeply unclear to me what we need. 'instrument' for the detector and 'exposure' for the times would cover most things. Those are both L2s that we would make optional? Another approach would just be to do something like one metadata keyword "image_filename" which is the filename of the image that the catalog was run on, and another metadata keyword "image_metadata" that is just given as an "object" and is identically the L2 / L3 image schema copied over, trying to avoid making a new set of metadata we need to separately track and copy... |
description: | | ||
Photometry and astrometry computed in the Source Catalog Step. | ||
tag: tag:astropy.org:astropy/table/table-1.* | ||
meta: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please put meta above the binary contents for both this and segmentation_map-1.0.0.yaml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes, order matters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't matter code-wise (nor in the end file, you.. you set that order below). More about readability of schemas.
Co-authored-by: William Jamieson <[email protected]>
Co-authored-by: Nadia Dencheva <[email protected]>
Co-authored-by: Nadia Dencheva <[email protected]>
Superseded by #393 |
(work in progress)
This PR implements a basic schema for the output from the Source Catalog Step being developed in spacetelescope/romancal#1102, with data models implemented in spacetelescope/roman_datamodels#317.
Checklist
CHANGES.rst
under the corresponding subsection