Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set separate 'thumbnail_url' for SMK #1450

Closed
1 task
stacimc opened this issue Aug 24, 2022 · 3 comments · Fixed by WordPress/openverse-catalog#903
Closed
1 task

Set separate 'thumbnail_url' for SMK #1450

stacimc opened this issue Aug 24, 2022 · 3 comments · Fixed by WordPress/openverse-catalog#903
Assignees
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon

Comments

@stacimc
Copy link
Collaborator

stacimc commented Aug 24, 2022

Problem

The SMK provider script ingests records with ~2MB images. These files are overloading our thumbnail service and result in an unacceptable 6-10s load time. If these requests to the thumbnail server time out, the frontend will default to using the full image url (so a page with many results from SMK could end up downloading ~20 2MB images).

We would like to keep the high quality image for image_url, but supply a separate thumbnail_url with a smaller size.

Description

This was actually originally implemented in this way in WordPress/openverse-api#506, so much of this code can simply be reintroduced. For a given record, it will generate:

  "image_url": "https://iip.smk.dk/iiif/jp2/KKSgb5100_34.TIF.jp2/full/!2048,/0/default.jpg",         # Note filesize given as 2048
  "thumbnail_url": "https://iip.smk.dk/iiif/jp2/KKSgb5100_34.TIF.jp2/full/!400,/0/default.jpg".  # Note filesize is 400

Currently, thumbnails are hard-coded to None for images by the ImageStore. This was done in WordPress/openverse-api#519, because at the time the thumbnail_url was always redundant (since thumbnails are generated by the server). Now that we have a use case for needing separate image and thumbnail urls, we should remove this hard-coding.

Additional context

This issue handles adding the data to the records in the Catalog. There will also need to be a change to the API to actually use the thumbnail_url in thumbnail generation. API issue: #675

Implementation

  • 🙋 I would be interested in implementing this feature.
@stacimc stacimc added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository labels Aug 24, 2022
@stacimc stacimc changed the title Set separate source url for thumbnail generation for SMK Set separate 'thumbnail_url' for SMK Aug 24, 2022
@obulat
Copy link
Contributor

obulat commented Aug 29, 2022

I wonder if it makes sense to have a separate column for thumbnail that is used only by a handful (currently, only SMK) of providers, or would it be better to save thumbnail to the metadata column?
I feel like it's better not to have a separate column if only one (or several) providers use it. If we decide to save thumbnail to the metadata object, we could either save the list of such providers and check if the provider is in it when generating thumbnails (I think this would be faster) or check if there's a thumbnail field in the metadata column, and use it, or fallback on the image url otherwise.

@zackkrida
Copy link
Member

Anecdotally, I've noticed that the 512px endpoint seems to have quite fast responses: https://iip.smk.dk/iiif/jp2/KKSgb5100_34.TIF.jp2/full/!512,/0/default.jpg

@AetherUnbound
Copy link
Collaborator

+1 to having this in the meta_data column! In Elasticsearch we're already accessing meta_data to pull out some other information that could be indexed:

https://github.com/WordPress/openverse-api/blob/1f32bb8b30bfd0c0bcc0454808a24b5a3e1ed53d/ingestion_server/ingestion_server/elasticsearch_models.py#L240

Perhaps we could have Elasticsearch look there first before serving a thumbnail link from the API! That way the image doesn't even have to be processed on the thumbnail server if a thumbnail URL already exists, we can just return the thumbnail URL in the API search response 😮

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants