Skip to content

Latest commit

 

History

History
29 lines (20 loc) · 2.27 KB

README.md

File metadata and controls

29 lines (20 loc) · 2.27 KB

MARC for Authority Records to RDF

XSL transformation for converting data represented with MARC 21 for Authority Records and serialized in MARC XML to RDF. The resulting RDF uses primarily Simple Knowledge Organization System and MADS/RDF. The XSLT is accompanied with scripted tasks for driving the transformation, loading data into an RDF store and executing SPARQL queries for enriching the data.

The transformation was developed for an LOD2 Publink project with the National Library of Israel.

Steps

Steps of the transformation are implemented as Rake tasks. Use rake -T to list all available tasks. Before running any of the tasks edit the configuration file in etc/config.xml.

  1. rake xslt[path/to/marc-21-a.xml] to execute the XSL transformation from MARC XML to RDF/XML (file tmp/output.rdf).
  2. rake fuseki:load to load the created RDF in Jena TDB.
  3. rake fuseki:start to start a SPARQL endpoint for the loaded data.
  4. rake sparql:enrich to issue several SPARQL Update requests that will enrich the processed data.
  5. rake sparql:metadata to compute dataset statistics and generate corresponding metadata in separate named graph.
  6. rake fuseki:stop to stop the SPARQL endpoint.
  7. rake fuseki:dump to export the transformed dataset into N-Quads files located in the tmp directory.
  8. rake fuseki:purge to clear all Jena TDB files.

Dependencies

  • Fuseki
  • Jena: uses Jena TDB as database
  • Rake: works with Ruby version 1.8.7 or newer
  • Saxon: version 9.x, can be replaced by any XSLT 2.0 processor

Known caveats

In case you get timeout errors (Timeout::Error) for some of the enrichment or metadata SPARQL generation queries, try increasing the timeout limit (ja:cxtValue property for ja:cxtName "arq:queryTimeout") in the Fuseki server configuration in etc/fuseki.ttl and then run the enrichment Rake task again.