XSL transformation for converting data represented with MARC 21 for Authority Records and serialized in MARC XML to RDF. The resulting RDF uses primarily Simple Knowledge Organization System and MADS/RDF. The XSLT is accompanied with scripted tasks for driving the transformation, loading data into an RDF store and executing SPARQL queries for enriching the data.
The transformation was developed for an LOD2 Publink project with the National Library of Israel.
Steps of the transformation are implemented as Rake tasks. Use rake -T
to list all available tasks. Before running any of the tasks edit the configuration file in etc/config.xml
.
rake xslt[path/to/marc-21-a.xml]
to execute the XSL transformation from MARC XML to RDF/XML (filetmp/output.rdf
).rake fuseki:load
to load the created RDF in Jena TDB.rake fuseki:start
to start a SPARQL endpoint for the loaded data.rake sparql:enrich
to issue several SPARQL Update requests that will enrich the processed data.rake sparql:metadata
to compute dataset statistics and generate corresponding metadata in separate named graph.rake fuseki:stop
to stop the SPARQL endpoint.rake fuseki:dump
to export the transformed dataset into N-Quads files located in thetmp
directory.rake fuseki:purge
to clear all Jena TDB files.
- Fuseki
- Jena: uses Jena TDB as database
- Rake: works with Ruby version 1.8.7 or newer
- Saxon: version 9.x, can be replaced by any XSLT 2.0 processor
In case you get timeout errors (Timeout::Error
) for some of the enrichment or metadata SPARQL generation queries, try increasing the timeout limit (ja:cxtValue
property for ja:cxtName "arq:queryTimeout"
) in the Fuseki server configuration in etc/fuseki.ttl
and then run the enrichment Rake task again.