Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to Import IIIF collections #90

Open
Abbe98 opened this issue Jul 6, 2022 · 13 comments
Open

Make it possible to Import IIIF collections #90

Abbe98 opened this issue Jul 6, 2022 · 13 comments
Labels
enhancement It would be nice to have this feature! wikicommons

Comments

@Abbe98
Copy link
Member

Abbe98 commented Jul 6, 2022

IIIF and the IIIF Presentation API are used by many GLAM institutions and the ability to import records IIIF Collections would greatly reusers who wish to clean GLAM data or users of the Commons extension.

Proposed solution

Given the collection root URL, an importer would traverse its content and fetch data from the various IIIF manifests in it.

Additional context

@Abbe98 Abbe98 added enhancement It would be nice to have this feature! wikicommons labels Jul 6, 2022
@ostephens
Copy link
Member

Thanks @Abbe98 this sounds like a really interesting use case. I can definitely see how we might have some specific tools/functions to support IIIF but I'm less sure what these might look like in reality.

Working from the start, could you say a bit more about how you'd see this working? For example the example collection you have posted, contains a set of further collections, which contain a mixture of more collections and manifests. What would the resulting OpenRefine project look like? What might be a typical data cleaning task within the resulting project?

@wetneb
Copy link
Member

wetneb commented Jul 8, 2022

I would intuitively keep this issue in the CommonsExtension repository unless there are things OpenRefine's side that need to be changed for such an importer to be implemented there.

@Abbe98
Copy link
Member Author

Abbe98 commented Jul 13, 2022

For example the example collection you have posted, contains a set of further collections, which contain a mixture of more collections and manifests.

@ostephens in my opinion, it would only fetch the data in manifests and each manifest would become a record, as one manifest is generally representing a single media file.

would intuitively keep this issue in the CommonsExtension repository unless there are things OpenRefine's side that need to be changed for such an importer to be implemented there.

Yeah, this probably shouldn't be in core. Not sure the CommonsExtension is the right place either considering that the features aren't dependent on each other.

@ostephens
Copy link
Member

@ostephens in my opinion, it would only fetch the data in manifests and each manifest would become a record, as one manifest is generally representing a single media file.

Checking my understanding, if the user were to specify the root URL "https://lbiiif.riksarkivet.se/collection/kartor-och-ritningar" the importer would be required to retrieve the content of the "items" array found at that URL and then:

  • If the Item has "type"=="Manifest" store the information in a row in a project
  • If the Item has "type"=="Collection", use the URL in the Collection ID property as the root URL and keep going

Through this process the importer would work through all Collections and Manifests that are discoverable from the original root URL and eventually end up with a project that contains all the Manifests that were found?

Have I understood the intention correctly?

@Abbe98
Copy link
Member Author

Abbe98 commented Jul 13, 2022

Through this process the importer would work through all Collections and Manifests that are discoverable from the original root URL and eventually end up with a project that contains all the Manifests that were found?

Have I understood the intention correctly?

@ostephens yes. I guess one might want to implement some optional limits(max x number of levels, max u number of records, etc).

@ostephens
Copy link
Member

ostephens commented Jul 13, 2022

Thanks @Abbe98. I'm not a IIIF expert, but I think it's allowed for collections to include items that are from anywhere online? So we could be ending up doing some extremely large-scale crawling here? (this could also be limited in some way of course - such as allowing the user to specify a domain as well as number of levels)

@ostephens
Copy link
Member

@Abbe98 in the case of finding a manifest, how would you want the information in the manifest stored in an OpenRefine row? To take an example from your root we have the collection ID https://lbiiif.riksarkivet.se/collection/arkiv/pZdxhTy01Y7BRBFEIaUwL4 which contains the manifest:

{
      "id": "https://lbiiif.riksarkivet.se/arkis!R0002353/manifest",
      "type": "Manifest",
      "label": {
        "sv": [
          "1:1 [Det långa parlamentets bortdrivande av Cromwell 1653 20/4. Samtida illustration (på engelska) och en tillhörande holländsk text.]"
        ]
      }
    }

What would the row/record stored in OpenRefine look like in this case?

@trnstlntk
Copy link
Contributor

I think it would be great to discuss with the (very active) IIIF community how they'd like this to be built, and maintained over the longer term.

@tfmorris
Copy link
Member

tfmorris commented Feb 9, 2023

I agree with @wetneb that this should be moved to a more appropriate repository.

The example collection manifest looks like JSON-LD, so it's already supported by OpenRefine, but with the limitations inherent in mapping tree-shaped (JSON & XML) formats to a rectangular grid.

The universe of JSON applications is obviously way too big to be building specific support into OpenRefine for each of them.

@wetneb wetneb transferred this issue from OpenRefine/OpenRefine Feb 9, 2023
@wetneb
Copy link
Member

wetneb commented Feb 9, 2023

So I have transfered it to the CommonsExtension repo, where it seems to be indeed duplicating #19 - not sure which one people want to keep?

@trnstlntk
Copy link
Contributor

trnstlntk commented Feb 10, 2023

I'm not sure if this is the right place after all. It may very well be that the IIIF community would mainly prefer to use IIIF integration in OpenRefine for generic data cleaning (not for Wikimedia Commons import)! IMO they are the ones to say/decide.

I would strongly suggest a bit of user research, asking potential users about their primary predicted use cases.

@Abbe98
Copy link
Member Author

Abbe98 commented Feb 10, 2023

My intent when I created this issue had nothing to do with Wikimedia Commons. While I too agree that it should be in a separate extension(I believe half of the core extensions should be moved from core...) but thought high-level extension request lived in core's issue tracker.

@wetneb
Copy link
Member

wetneb commented Feb 24, 2024

I have created a wiki page to list some extension requests and listed IIIF there:
https://github.com/OpenRefine/OpenRefine/wiki/Extension-ideas#iiif-import
Pages on the wiki aren't super visible, so it's not clear to me that's the best place to put it. The main issue tracker is also an option, but not ideal either since they are not meant to be implemented in that repository. Maybe it could also be on the openrefine.org website, but then it's probably harder to edit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement It would be nice to have this feature! wikicommons
Projects
Status: 🛣 SDC support: Larger feature requests (future grants?)
Development

No branches or pull requests

5 participants