Maninpasta minutes: BuildingKG#1 #6

enridaga · 2021-04-06T09:07:22Z

To collect notes on the discussion in this group

enridaga · 2021-04-06T09:08:39Z

It would be useful to start collect a list of resources that the project members would like to include in the KG

85jesse · 2021-04-06T09:12:08Z

The Dutch Network for Digital Heritage has developed a registry for datasets: https://github.com/netwerk-digitaal-erfgoed/register. This could provide an infrastructural component for Polifonia datasets as well. By registering them in this fashion the dataset-metadata can be queried, and various endpoints can be found.

enridaga · 2021-04-06T09:14:00Z

A catalogue of musical resources on the Web was developed a few years ago: https://musow.kmi.open.ac.uk/

85jesse · 2021-04-06T09:14:23Z

For the project it will be relevant to create an overview of relevant datasets and their status (the degree to which they are published as semantic or linked data), but also their legal status.
Could be anything from data that needs to be scraped from html pages, to full RDF datasets.
Then within Polifonia pipelines can be created (or use existing pipelines) to transform the data so that it can be used within the project.

paolobonora · 2021-04-06T09:15:20Z

Example of sources for the Carolina#1 CS about Perti, Giacomo Antonio:

Catalogue of Museo della Musica of Bologna: http://www.bibliotecamusica.it/cmbm/scripts/gaspari/libri.asp?ms=%27E%27&ms=%27M%27&ID=3589
Corago LOD: http://coragolod2.ing.unibo.it:8080/corago/resource/RESPONS/APCN00004400
Data from REPIM: 13 works from relational DB (we own)
Grove Music Online: https://doi.org/10.1093/gmo/9781561592630.article.21394

enridaga · 2021-04-06T09:17:25Z

Interested in the REPIM, "Repertorio Poesia in Musica", including secular music from 15th-17th centuries

enridaga · 2021-04-06T09:18:39Z

Obviously, there is a page on Wikipedia: https://en.wikipedia.org/wiki/Giacomo_Antonio_Perti

paolobonora · 2021-04-06T09:22:01Z

And we should also take into account the VIAF entry: http://viaf.org/viaf/19946155

enridaga · 2021-04-06T09:25:47Z

Take away message 1: we need a registry!

paolobonora · 2021-04-06T09:26:15Z

Another example: RISM (Répertoire International des Sources Musicales): https://opac.rism.info/metaopac/refineSearch.do;jsessionid=3DA5C69F3B0EFAE7E062CB0F2635B567.touch01?id=author_facet&methodToCall=filterSearch&subval=Perti%2CGiacomoAntonio

enridaga · 2021-04-06T09:26:23Z

Take away message 2: there is a huge diversity of formats / availability status / quality

enridaga · 2021-04-06T09:27:15Z

On the diversity of format, the OU is working on a new tool to SPARQL non-RDF resources: https://github.com/SPARQL-Anything/sparql.anything

paolobonora · 2021-04-06T09:28:43Z

RISM is "available" also in RDF: https://opac.rism.info/id/rismid/454006820?format=rdf

85jesse · 2021-04-06T09:32:00Z

The registry needs to be filled, which is open to all. But we would also like to bring datasets a step further towards full Linked Open Data publication (by setting up pipelines, settling on a dataformat, linking with various sources, etc), but there we need to focus our efforts. So we need to see which datasets need to be prioritised: are there any specific usecases within Polifonia that have a need for specific datasets?

enridaga · 2021-04-06T09:34:41Z

One challenge is that we cannot ask all data providers to commit to one ontological representation, which creates an interesting challenge in terms of developing an exploratory system for the KG

85jesse · 2021-04-06T09:35:52Z

We also have to formulate requirements for datasets requirements in terms of performance (what kind of queries do they need to be able to handle? How long can it take before results are returned?). But we also need to decide on the degree to which the data can be cached/aggregated to boost performance.

enridaga · 2021-04-06T09:40:32Z

Maybe we need to pursue a mixed strategy, by leaving data to the provider but caching as soon as access is requested. A search facility will necessarily need to index all the data, though.

85jesse · 2021-04-06T09:48:46Z

Datasets to be registered also include thesauri/vocabularies that are being used within the music-domain. Would also be interesting to know which vocabularies have been linked/aligned with other (public) datasources (e.g. Wikidata, Discogs, etc.). These vocabularies can act as linking layers in the Knowledge Graph.

enridaga · 2021-04-06T09:58:39Z

Maybe we can use GitHub as a repository for the registry, and include a JSON-(LD) file for each of the resources. The musoW web application can just expose data from there

enridaga · 2021-04-06T10:03:28Z

We should also consider using GitHub to host the actual datasets / linked data

paolobonora · 2021-04-06T10:09:47Z

We should define a basic process for a request of a new source within the KG.
Something like:

proposed from user
under analisys/triage
accepted and being aligned
added (and what has ben included)
rejected

enridaga · 2021-04-06T10:13:50Z

There is a set of key-questions about integrating resources into the knowledge graph: what do we integrate?

Metadata about the resources (easy, mandatory)
Schema elements / vocabularies
Entity linking

enridaga · 2021-04-06T10:16:20Z

Are we expecting to copy the original resource and transform it with our vocabulary? Instead, are we asking data providers to commit to our representation? Something in between the two extremes?

85jesse · 2021-04-06T10:21:26Z

Another possible piece of the puzzle for vocabularies is the 'Network of Terms', an application that allows you to search multiple vocabularies via a single API: https://github.com/netwerk-digitaal-erfgoed/network-of-terms-api
This gives a good overview of vocabularies that are already out there that can be used. It's up to the collection holders to decide which vocabularie(s) they want to use.

enridaga · 2021-04-06T11:00:47Z

Work in progress query to generate schema.org descriptions from the current musoW catalogue:

PREFIX dct: <http://purl.org/dc/terms/>
PREFIX schema: <http://schema.org/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
CONSTRUCT {
  ?item schema:identifier ?identifier ;
    	schema:license ?license ;
        schema:featureList ?feature
} 
FROM <http://data.open.ac.uk/context/musow>
WHERE {
VALUES (?item) {(<http://data.open.ac.uk/musow/79e8954f4b0bc004e3ed8e5ea91bf7b4>)} .
  ?item dct:identifier ?identifier ;
        <http://data.open.ac.uk/musow/ontology/access/type> ?charged ;
        <http://data.open.ac.uk/musow/ontology/situation/task> ?task ;
        <http://dbpedia.org/ontology/category> ?category ;
        <http://purl.org/dc/terms/accessRights> ?accessRights ;
        <http://purl.org/dc/terms/license> ?license ;
        <http://schema.org/featureList> ?feature
  .
}

enridaga · 2021-04-26T08:56:30Z

Shall we rename this issue into RegistryActivity?

enridaga · 2021-04-26T08:56:49Z

Shall we move this issue to the Registry repository?

albertmeronyo · 2021-05-14T14:01:02Z

Meeting 14-05-2021 on Sethus: @enridaga suggested adding MEI document support in SPARQL Anything to enable KG access/ingestion/creation

AP: presenting the idea (and maybe a prototype if time allows?) at the MEI WG meeting on 28-05-2021 would be great

enridaga added this to the Maninpasta (6/04/2021) milestone Apr 6, 2021

enridaga self-assigned this Apr 14, 2021

enridaga assigned 85jesse, marilenadaquino and paolobonora Apr 26, 2021

enridaga mentioned this issue Jun 22, 2021

We should define a basic process for a request of a new source within the KG. polifonia-project/registry-data#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maninpasta minutes: BuildingKG#1 #6

Maninpasta minutes: BuildingKG#1 #6

enridaga commented Apr 6, 2021

enridaga commented Apr 6, 2021

85jesse commented Apr 6, 2021

enridaga commented Apr 6, 2021

85jesse commented Apr 6, 2021

paolobonora commented Apr 6, 2021

enridaga commented Apr 6, 2021 •

edited

Loading

enridaga commented Apr 6, 2021

paolobonora commented Apr 6, 2021

enridaga commented Apr 6, 2021

paolobonora commented Apr 6, 2021

enridaga commented Apr 6, 2021

enridaga commented Apr 6, 2021

paolobonora commented Apr 6, 2021

85jesse commented Apr 6, 2021

enridaga commented Apr 6, 2021

85jesse commented Apr 6, 2021 •

edited

Loading

enridaga commented Apr 6, 2021

85jesse commented Apr 6, 2021

enridaga commented Apr 6, 2021

enridaga commented Apr 6, 2021

paolobonora commented Apr 6, 2021

enridaga commented Apr 6, 2021

enridaga commented Apr 6, 2021

85jesse commented Apr 6, 2021

enridaga commented Apr 6, 2021

enridaga commented Apr 26, 2021

enridaga commented Apr 26, 2021

albertmeronyo commented May 14, 2021

Maninpasta minutes: BuildingKG#1 #6

Maninpasta minutes: BuildingKG#1 #6

Comments

enridaga commented Apr 6, 2021

enridaga commented Apr 6, 2021

85jesse commented Apr 6, 2021

enridaga commented Apr 6, 2021

85jesse commented Apr 6, 2021

paolobonora commented Apr 6, 2021

enridaga commented Apr 6, 2021 • edited Loading

enridaga commented Apr 6, 2021

paolobonora commented Apr 6, 2021

enridaga commented Apr 6, 2021

paolobonora commented Apr 6, 2021

enridaga commented Apr 6, 2021

enridaga commented Apr 6, 2021

paolobonora commented Apr 6, 2021

85jesse commented Apr 6, 2021

enridaga commented Apr 6, 2021

85jesse commented Apr 6, 2021 • edited Loading

enridaga commented Apr 6, 2021

85jesse commented Apr 6, 2021

enridaga commented Apr 6, 2021

enridaga commented Apr 6, 2021

paolobonora commented Apr 6, 2021

enridaga commented Apr 6, 2021

enridaga commented Apr 6, 2021

85jesse commented Apr 6, 2021

enridaga commented Apr 6, 2021

enridaga commented Apr 26, 2021

enridaga commented Apr 26, 2021

albertmeronyo commented May 14, 2021

enridaga commented Apr 6, 2021 •

edited

Loading

85jesse commented Apr 6, 2021 •

edited

Loading