Skip to content

Latest commit



78 lines (47 loc) · 4.34 KB

File metadata and controls

78 lines (47 loc) · 4.34 KB


Working files for the Community Group

Component Files

scripts This area contains scripts for processing source Bibframe data to obtain data.

  • This is an open source general purpose python script for processing RDF data, which has been tweaked a little to make it particularly useful to

    Its operation is controlled by passing command-line parameters:

    • --input -i Source input RDF file, or URL of input RDF file, or directory containing one or more RDF files. Acceptable formats (RDF/XML, json-ld, turtle, nt).
    • --output -o Output file (for single file) or directory for multiple files.
    • --outfile -O Overriding output file name
    • --batchload -b Load all input files then output combination into single output file
    • --format -f Serialisation format for output files (xml|rdf|n3|turtle|nt|nquads|json-ld) - influences output file name extension. Default format turtle.
    • --query -q File, or directory of files, containing SPARQL query scripts to process imported RDF data to produce output RDF data.
    • --bindings -B Key-value pairs for SPARQL varible bindings to be passed to SPARQL query processing
    • --querycount -c Number of times to process query scripts before outputting resultant data. Default count 1.
    • --schemaonly -s Only output triples that contain a URI from the vocabulary as a subject or predicate.
    • --preprocess -p Source preprocess function (eg. LoCSRUResponse). Function to process input data before attempting to load RDF from it
    • --postprocess -P Graph postprocess function. Function to proccess RDF graph prior to output serialization.
    • -v Run in verbose mode.
    • --version -V Output version

    Principle of opperation:

    • Load each source file, or URL, in turn into an auto-generated RDF triple-store. RDF syntax used is auto sensed - see rdflib documentation for details.
    • Process each query, in sorted order, against each triple-store. Note: Bindings passed via command line, plus two in-built bindings (?NOW and ?TODAY) are passed to the query.
    • Repeat querycount times
    • If schemaonly is selected, all triples except those containing URIs from vocabulary are deleted from triple-store
    • Contents of triple-store are serialised in the chosen format to output file, or a file (name calculated from the input filename and the output format)
    • In batchload mode, all source files are loaded before processing is commenced. This results in a single combined output file.

    Example operation from source directory:

    scripts/ -i tests/source -o tests/out -q query/bibframe2schema.sparql -f json-ld -s -v -B "SDPUBLISHER=<>" "SDLICENSE=<>"

    Example operation for individual source file:

    scripts/ -i tests/source/LCCN-98033893.xml -o tests/out -q query/bibframe2schema.sparql -t -f jsonld -s -v

    Example operation for individual URL file:

    scripts/ -i -o tests/out -q query/bibframe2schema.sparql -f json-ld -s -v

Operational Environment and Dependancies: Is a python script tested with Python version 3.10 on Linux-like operating systems (incuding Mac-OS). It depends on some python libraries that may need loading, using the pip install -r requirements.txt command.


This area contains SPARQL scripts for processing source Bibframe data to obtain data.

  • bibframe2schema.sparql

    SPARQL Script, using the INSERT verb, to add triples to existing Bibframe (2.0) description.

Intended for use as an input query script for the script. However, it is equally applicable to other SPARQL scripting environments.


Area for running test conversions. Includes example source file(s) and resultant output files.

Source Preprocessing:


Preprocessor (instigated using -p LoCSRUResponse) designed to handle XML output from Library of Congress search request which captures RDF output within a searchRetrieveResponse element structure.