Feature/thumbnails #73

mgdaily · 2024-05-16T22:49:56Z

Getting this PR out just to make sure that things mostly look okay. I still need to make sure this is documented correctly in the API docs and write an additional article in the OCS doc about how to ingest thumbnails. The tests are currently failing because this branch relies on changes made to the ocs_archive library (see that PR: observatorycontrolsystem/ocs_archive#11).

This PR adds a Thumbnail model to science archive. It has a foreign key relationship to frame, so a Frame may have many thumbnails. Frames can be filtered in the usual way, with thumbnails optionally included in the response by using the query param include_thumbnails=true. We also handle the case that a thumbnail arrives before its associated frame by serializing the payload into a Frame object, creating it, then creating the thumbnail (see ingester PR - we send the same frame payload along with a couple of extra bits of metadata to create the thumbnail and associate it).

I've tested this on dev (http://archive-api-dev.lco.gtn) by pulling down 500 frames, generating a small and large thumbnail for each, and ingesting each frame/small+large thumbnail in a random order to ensure that we handle the case where frames arrive first, or thumbnails arrive first.

I also tested the migration against the staging DB and it took less than 10 seconds, so we should have little downtime when deploying this.

Also add a basic serializer

…dd filters

Namely, we'd like to check that the thumbnail data coming in is valid using the frame serializer, and just use the filestore key and metadata to generate the URL. Then we create a minimal frame object and associate that with the thumbnail.

Also don't try to generate a url or filename for frames that don't yet have a version_set

Make the size choices for thumbnails configurable, clean up the frame creation on thumbnail create, also change up the fields on thumbnail slightly to just store basename/extension so we can derive the filename and minimize changes to the ingester.

…rame Also update a few tests

archive/frames/models.py

mgdaily · 2024-05-16T22:52:20Z

archive/frames/serializers.py

@@ -87,7 +88,7 @@ class Meta:
        # }

    def create(self, validated_data):
-        version_data = validated_data.pop('version_set')
+        version_data = validated_data.pop('version_set') if 'version_set' in validated_data else {}


We need to handle the case that we may create a Frame that doesn't yet have a version set - this is when we create a thumbnail whose frame has not yet been created.

mgdaily · 2024-05-16T22:55:57Z

archive/frames/views.py

+        logger.info('Got request to process thumbnail', extra=logger_tags)
+        # Make sure we have the minimum information to make a frame object associated with the thumbnail if one doesn't already exist
+        frame_serializer = FrameSerializer(data=request.data)
+        if frame_serializer.is_valid():


This view is probably the most complicated part of this. I've designed this so that we always expect a valid frame payload from the ingester so we're able to create a basic Frame object. I've done this to keep the changes in the ingester minimal - the ingester still uploads to s3 and gets the version information, but rather than storing the key and extension on the frame within the version_set, we grab it and store it on the thumbnail instead, and store the Frame without any version info.

jnation3406

I think we should probably add a ThumbnailFile class to the ocs_archive that only requires what it needs and plumb that through. Should make things a little more efficient and explicit.

archive/frames/models.py

jnation3406 · 2024-05-17T21:15:15Z

archive/frames/models.py

+    @cached_property
+    def url(self):
+        metadata = self.frame.get_header_dict()
+        # include frame basename and size so that this passes metadata validation in the DataFile class


I think the proper approach would be to add a thumbnail type to the ocs_archive library - we might want to store thumbnails in their own /thumbnails directory in the filestore, not in raw or processed like it would now. This could also be done in the ocs_archive file type we create for thumbnails. It should also not need to use the related frames basename then, just its own basename and the frame fields (to get site/inst/dayobs/ directory) is probably enough. Also shouldn't need to get the header, which would make another DB call or table join - it just needs the site, inst, dayobs to determine its directory if we set that up in the ocs_archive thumbnail file class, and it can get those directly from the frame model.

jnation3406 · 2024-05-17T21:25:16Z

archive/frames/views.py

+        if thumbnail_serializer.is_valid():
+            # Remove the version set as this version does not correspond to the frame object, but rather the thumbnail.
+            del frame_serializer.validated_data['version_set']
+            frame = frame_serializer.save(basename=request.data['frame_basename'])


Don't we want to just get the existing frame if it exists, or save/create it if it doesn't?

jnation3406 · 2024-05-17T21:34:26Z

Also have you thought about what effect this might have on things that use the archive and make frames queries. I.e. the frontend or other apps we have? What will happen to those if they query frames that have a thumbnail but not the actual frame yet?

mgdaily · 2024-05-17T21:46:24Z

Also have you thought about what effect this might have on things that use the archive and make frames queries. I.e. the frontend or other apps we have? What will happen to those if they query frames that have a thumbnail but not the actual frame yet?

Good point. The easiest way I can think of would be to filter out any Frames from the list view that don't have any version information associated with them. That way users only see the data that's been completely ingested.

We always want a frame associated with the thumbnail. Re-make migration and rename it.

Make sure we check for a frame in the thumbnail serializer before we attempt to create one. Also update the url property on the thumbnail model to use a simplified get_filestore_path method so that we don't need to pull all of the header information.

mgdaily · 2024-05-29T00:45:57Z

archive/frames/models.py

+
+    @cached_property
+    def url(self):
+        path = ThumbnailFile.get_filestore_path_from_frame_metadata(self.frame.site_id, self.frame.instrument_id, 


@jnation3406 you left a comment about the previous version of this function needing the header to construct the filestore path using the DataFile class. Problem was, we needed to construct the DataFile class with all valid metadata so it could determine the filestore path using only part of that metadata, which as you mentioned, would lead to unnecessary table joins and accesses. So what I've done in this commit was add a static method to the ThumbnailFile class to take in only the metadata needed to construct the filestore path. To preserve the DataFile interface (subclasses of DataFile need to implement get_filestore_path for the filestore to work), I just call the static method and use the data stored in the ThumbnailFile's metadata.

This allows the science archive to provide a minimal set of metadata without having to do too much DB work.

This is okay I guess, but if you want to avoid importing the ThumbnailFile directly, you could keep using the get_file_store_path util function but instead of passing the header, just pass a metadata dict of the header keys you need, i.e.
get_file_store_path(self.filename, {'site_id': self.frame.site_id, 'instrument_id': self.frame.instrument_id, 'observation_day': self.frame.observation_day.strftime('%Y%m%d')})

jnation3406

Looks good, just one comment

jnation3406 · 2024-05-30T22:18:16Z

archive/frames/models.py

+
+    @cached_property
+    def url(self):
+        path = ThumbnailFile.get_filestore_path_from_frame_metadata(self.frame.site_id, self.frame.instrument_id, 


This is okay I guess, but if you want to avoid importing the ThumbnailFile directly, you could keep using the get_file_store_path util function but instead of passing the header, just pass a metadata dict of the header keys you need, i.e.
get_file_store_path(self.filename, {'site_id': self.frame.site_id, 'instrument_id': self.frame.instrument_id, 'observation_day': self.frame.observation_day.strftime('%Y%m%d')})

This was causing an error trying to render a filter on a ForeignKey field

mgdaily added 12 commits April 22, 2024 19:48

Add thumbnail model to represent thumbnail images

76135ee

Also add a basic serializer

Intermediate check-in. Not stable yet.

7378a00

WIP on archive changes. Update thumbnails model, update serializer, a…

7e07fc6

…dd filters

Add test for thumbnail filtering by frame

b8d1169

Also don't try to generate a url or filename for frames that don't yet have a version_set

Refactor a bit.

439b120

Make the size choices for thumbnails configurable, clean up the frame creation on thumbnail create, also change up the fields on thumbnail slightly to just store basename/extension so we can derive the filename and minimize changes to the ingester.

Misc updates to make sure the basename gets properly plumbed to the f…

8097087

…rame Also update a few tests

Final fixes to serializers and additional tests.

6a6b737

Small tweaks to comments, formatting, etc...

40fe4ed

Filter out thumbnails from frame list by default

8dff313

Make version 32 char max for thumbnails. Update migration

786eb5d

Define the post_delete handler for the thumbnails

95a3400

mgdaily requested a review from jnation3406 May 16, 2024 22:49

mgdaily commented May 16, 2024

View reviewed changes

archive/frames/models.py Show resolved Hide resolved

mgdaily commented May 16, 2024

View reviewed changes

jnation3406 requested changes May 17, 2024

View reviewed changes

mgdaily added 3 commits May 20, 2024 16:30

Remove nullable property from thumbnail FK

9b1aad7

We always want a frame associated with the thumbnail. Re-make migration and rename it.

Fixes based on review comments.

b682b9b

Make sure we check for a frame in the thumbnail serializer before we attempt to create one. Also update the url property on the thumbnail model to use a simplified get_filestore_path method so that we don't need to pull all of the header information.

Update delete_data to use updated filestore path method

ba3133a

mgdaily commented May 29, 2024

View reviewed changes

mgdaily requested a review from jnation3406 May 29, 2024 00:48

jnation3406 approved these changes May 30, 2024

View reviewed changes

mgdaily added 5 commits May 31, 2024 14:37

Remove erroneous frame field from ThumbnailFilter

67479b2

This was causing an error trying to render a filter on a ForeignKey field

Remove Thumbnail import, use get_file_store_path util

79c78e3

Update ocs-archive version, update changelog, bump version

def4474

Fix codacy issues

ad646f9

Fix up trailing whitespace

9c4ec42

mgdaily merged commit c7f29d9 into main Jun 3, 2024
11 of 12 checks passed

mgdaily deleted the feature/thumbnails branch June 3, 2024 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/thumbnails #73

Feature/thumbnails #73

mgdaily commented May 16, 2024 •

edited

Loading

mgdaily May 16, 2024

mgdaily May 16, 2024 •

edited

Loading

jnation3406 left a comment

jnation3406 May 17, 2024

jnation3406 May 17, 2024

jnation3406 commented May 17, 2024

mgdaily commented May 17, 2024 •

edited

Loading

mgdaily May 29, 2024 •

edited

Loading

jnation3406 May 30, 2024

jnation3406 left a comment

jnation3406 May 30, 2024

Feature/thumbnails #73

Feature/thumbnails #73

Conversation

mgdaily commented May 16, 2024 • edited Loading

mgdaily May 16, 2024

Choose a reason for hiding this comment

mgdaily May 16, 2024 • edited Loading

Choose a reason for hiding this comment

jnation3406 left a comment

Choose a reason for hiding this comment

jnation3406 May 17, 2024

Choose a reason for hiding this comment

jnation3406 May 17, 2024

Choose a reason for hiding this comment

jnation3406 commented May 17, 2024

mgdaily commented May 17, 2024 • edited Loading

mgdaily May 29, 2024 • edited Loading

Choose a reason for hiding this comment

jnation3406 May 30, 2024

Choose a reason for hiding this comment

jnation3406 left a comment

Choose a reason for hiding this comment

jnation3406 May 30, 2024

Choose a reason for hiding this comment

mgdaily commented May 16, 2024 •

edited

Loading

mgdaily May 16, 2024 •

edited

Loading

mgdaily commented May 17, 2024 •

edited

Loading

mgdaily May 29, 2024 •

edited

Loading