-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/thumbnails #73
Conversation
Also add a basic serializer
Namely, we'd like to check that the thumbnail data coming in is valid using the frame serializer, and just use the filestore key and metadata to generate the URL. Then we create a minimal frame object and associate that with the thumbnail.
Also don't try to generate a url or filename for frames that don't yet have a version_set
Make the size choices for thumbnails configurable, clean up the frame creation on thumbnail create, also change up the fields on thumbnail slightly to just store basename/extension so we can derive the filename and minimize changes to the ingester.
…rame Also update a few tests
@@ -87,7 +88,7 @@ class Meta: | |||
# } | |||
|
|||
def create(self, validated_data): | |||
version_data = validated_data.pop('version_set') | |||
version_data = validated_data.pop('version_set') if 'version_set' in validated_data else {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to handle the case that we may create a Frame that doesn't yet have a version set - this is when we create a thumbnail whose frame has not yet been created.
logger.info('Got request to process thumbnail', extra=logger_tags) | ||
# Make sure we have the minimum information to make a frame object associated with the thumbnail if one doesn't already exist | ||
frame_serializer = FrameSerializer(data=request.data) | ||
if frame_serializer.is_valid(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This view is probably the most complicated part of this. I've designed this so that we always expect a valid frame payload from the ingester so we're able to create a basic Frame object. I've done this to keep the changes in the ingester minimal - the ingester still uploads to s3 and gets the version information, but rather than storing the key and extension on the frame within the version_set, we grab it and store it on the thumbnail instead, and store the Frame without any version info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably add a ThumbnailFile class to the ocs_archive that only requires what it needs and plumb that through. Should make things a little more efficient and explicit.
archive/frames/models.py
Outdated
@cached_property | ||
def url(self): | ||
metadata = self.frame.get_header_dict() | ||
# include frame basename and size so that this passes metadata validation in the DataFile class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proper approach would be to add a thumbnail type to the ocs_archive library - we might want to store thumbnails in their own /thumbnails directory in the filestore, not in raw or processed like it would now. This could also be done in the ocs_archive file type we create for thumbnails. It should also not need to use the related frames basename then, just its own basename and the frame fields (to get site/inst/dayobs/ directory) is probably enough. Also shouldn't need to get the header, which would make another DB call or table join - it just needs the site, inst, dayobs to determine its directory if we set that up in the ocs_archive thumbnail file class, and it can get those directly from the frame model.
archive/frames/views.py
Outdated
if thumbnail_serializer.is_valid(): | ||
# Remove the version set as this version does not correspond to the frame object, but rather the thumbnail. | ||
del frame_serializer.validated_data['version_set'] | ||
frame = frame_serializer.save(basename=request.data['frame_basename']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to just get the existing frame if it exists, or save/create it if it doesn't?
Also have you thought about what effect this might have on things that use the archive and make frames queries. I.e. the frontend or other apps we have? What will happen to those if they query frames that have a thumbnail but not the actual frame yet? |
Good point. The easiest way I can think of would be to filter out any Frames from the list view that don't have any version information associated with them. That way users only see the data that's been completely ingested. |
We always want a frame associated with the thumbnail. Re-make migration and rename it.
Make sure we check for a frame in the thumbnail serializer before we attempt to create one. Also update the url property on the thumbnail model to use a simplified get_filestore_path method so that we don't need to pull all of the header information.
archive/frames/models.py
Outdated
|
||
@cached_property | ||
def url(self): | ||
path = ThumbnailFile.get_filestore_path_from_frame_metadata(self.frame.site_id, self.frame.instrument_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnation3406 you left a comment about the previous version of this function needing the header to construct the filestore path using the DataFile class. Problem was, we needed to construct the DataFile class with all valid metadata so it could determine the filestore path using only part of that metadata, which as you mentioned, would lead to unnecessary table joins and accesses. So what I've done in this commit was add a static method to the ThumbnailFile class to take in only the metadata needed to construct the filestore path. To preserve the DataFile interface (subclasses of DataFile need to implement get_filestore_path
for the filestore to work), I just call the static method and use the data stored in the ThumbnailFile's metadata.
This allows the science archive to provide a minimal set of metadata without having to do too much DB work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is okay I guess, but if you want to avoid importing the ThumbnailFile directly, you could keep using the get_file_store_path util function but instead of passing the header, just pass a metadata dict of the header keys you need, i.e.
get_file_store_path(self.filename, {'site_id': self.frame.site_id, 'instrument_id': self.frame.instrument_id, 'observation_day': self.frame.observation_day.strftime('%Y%m%d')})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just one comment
archive/frames/models.py
Outdated
|
||
@cached_property | ||
def url(self): | ||
path = ThumbnailFile.get_filestore_path_from_frame_metadata(self.frame.site_id, self.frame.instrument_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is okay I guess, but if you want to avoid importing the ThumbnailFile directly, you could keep using the get_file_store_path util function but instead of passing the header, just pass a metadata dict of the header keys you need, i.e.
get_file_store_path(self.filename, {'site_id': self.frame.site_id, 'instrument_id': self.frame.instrument_id, 'observation_day': self.frame.observation_day.strftime('%Y%m%d')})
This was causing an error trying to render a filter on a ForeignKey field
Getting this PR out just to make sure that things mostly look okay. I still need to make sure this is documented correctly in the API docs and write an additional article in the OCS doc about how to ingest thumbnails. The tests are currently failing because this branch relies on changes made to the ocs_archive library (see that PR: observatorycontrolsystem/ocs_archive#11).
This PR adds a Thumbnail model to science archive. It has a foreign key relationship to frame, so a Frame may have many thumbnails. Frames can be filtered in the usual way, with thumbnails optionally included in the response by using the query param
include_thumbnails=true
. We also handle the case that a thumbnail arrives before its associated frame by serializing the payload into a Frame object, creating it, then creating the thumbnail (see ingester PR - we send the same frame payload along with a couple of extra bits of metadata to create the thumbnail and associate it).I've tested this on dev (http://archive-api-dev.lco.gtn) by pulling down 500 frames, generating a small and large thumbnail for each, and ingesting each frame/small+large thumbnail in a random order to ensure that we handle the case where frames arrive first, or thumbnails arrive first.
I also tested the migration against the staging DB and it took less than 10 seconds, so we should have little downtime when deploying this.