-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maninpasta minutes: BuildingKG#1 #6
Comments
It would be useful to start collect a list of resources that the project members would like to include in the KG |
The Dutch Network for Digital Heritage has developed a registry for datasets: https://github.com/netwerk-digitaal-erfgoed/register. This could provide an infrastructural component for Polifonia datasets as well. By registering them in this fashion the dataset-metadata can be queried, and various endpoints can be found. |
A catalogue of musical resources on the Web was developed a few years ago: https://musow.kmi.open.ac.uk/ |
For the project it will be relevant to create an overview of relevant datasets and their status (the degree to which they are published as semantic or linked data), but also their legal status. |
Example of sources for the Carolina#1 CS about Perti, Giacomo Antonio:
|
Interested in the REPIM, "Repertorio Poesia in Musica", including secular music from 15th-17th centuries |
Obviously, there is a page on Wikipedia: https://en.wikipedia.org/wiki/Giacomo_Antonio_Perti |
And we should also take into account the VIAF entry: http://viaf.org/viaf/19946155 |
Take away message 1: we need a registry! |
Another example: RISM (Répertoire International des Sources Musicales): https://opac.rism.info/metaopac/refineSearch.do;jsessionid=3DA5C69F3B0EFAE7E062CB0F2635B567.touch01?id=author_facet&methodToCall=filterSearch&subval=Perti%2CGiacomoAntonio |
Take away message 2: there is a huge diversity of formats / availability status / quality |
On the diversity of format, the OU is working on a new tool to SPARQL non-RDF resources: https://github.com/SPARQL-Anything/sparql.anything |
RISM is "available" also in RDF: https://opac.rism.info/id/rismid/454006820?format=rdf |
The registry needs to be filled, which is open to all. But we would also like to bring datasets a step further towards full Linked Open Data publication (by setting up pipelines, settling on a dataformat, linking with various sources, etc), but there we need to focus our efforts. So we need to see which datasets need to be prioritised: are there any specific usecases within Polifonia that have a need for specific datasets? |
One challenge is that we cannot ask all data providers to commit to one ontological representation, which creates an interesting challenge in terms of developing an exploratory system for the KG |
We also have to formulate requirements for datasets requirements in terms of performance (what kind of queries do they need to be able to handle? How long can it take before results are returned?). But we also need to decide on the degree to which the data can be cached/aggregated to boost performance. |
Maybe we need to pursue a mixed strategy, by leaving data to the provider but caching as soon as access is requested. A search facility will necessarily need to index all the data, though. |
Datasets to be registered also include thesauri/vocabularies that are being used within the music-domain. Would also be interesting to know which vocabularies have been linked/aligned with other (public) datasources (e.g. Wikidata, Discogs, etc.). These vocabularies can act as linking layers in the Knowledge Graph. |
Maybe we can use GitHub as a repository for the registry, and include a JSON-(LD) file for each of the resources. The musoW web application can just expose data from there |
We should also consider using GitHub to host the actual datasets / linked data |
We should define a basic process for a request of a new source within the KG.
|
There is a set of key-questions about integrating resources into the knowledge graph: what do we integrate?
|
Are we expecting to copy the original resource and transform it with our vocabulary? Instead, are we asking data providers to commit to our representation? Something in between the two extremes? |
Another possible piece of the puzzle for vocabularies is the 'Network of Terms', an application that allows you to search multiple vocabularies via a single API: https://github.com/netwerk-digitaal-erfgoed/network-of-terms-api |
Work in progress query to generate schema.org descriptions from the current musoW catalogue:
|
Shall we rename this issue into RegistryActivity? |
Shall we move this issue to the Registry repository? |
Meeting 14-05-2021 on Sethus: @enridaga suggested adding MEI document support in SPARQL Anything to enable KG access/ingestion/creation AP: presenting the idea (and maybe a prototype if time allows?) at the MEI WG meeting on 28-05-2021 would be great |
To collect notes on the discussion in this group
The text was updated successfully, but these errors were encountered: