Architecture

The following figure presents the architecture of the second prototype of ALIADA tool.

ALIADA first prototype architecture

As the diagram shows, ALIADA tool is composed of several web application components installed in an Apache Tomcat Web Application server. These components are configured by data stored in a MySQL DDBB. The RDF triples generated by ALIADA are stored in a Virtuoso RDF Store.

The ALIADA web application components are the following:

User Interface. The multilingual user interface (Spanish and English at this moment) component will interact with the rest of the components of ALIADA to send the user commands to ALIADA’s components, to support the ALIADA components workflow and to show ALIADA`s feedback to the user.
RDFizer. It processes the different format types of the input data and converts them to RDF triples, according to the ALIADA ontology. The RDFizer converts LIDO XML, MARC XML and Dublin Core. The triples generated are stored in the corresponding graph of a Virtuoso RDF store. It also provides a dedicated component for doing NLP (Natural Language Processing) analysis on some terms extracted from records. The NER (Named Entity Recognition) results are stored as RDF triples that enrich the owning records. Finally, it validates the consistency of the RDF triples according to ALIADA ontology.
Links Discovery. It discovers the relationships between the RDF triples generated and data items within different linked data sources in the Linked Open Data Cloud that provide a SPARQL endpoint, and some other data sources that do not provide this type of end-point, but do provide an ad-hoc API. These discovered relationships are inserted as owl:sameAs triples in the corresponding graph of Virtuoso RDF store. The discovering task in the data sources that provide a SPARQL endpoint is carried out by using Silk Discovery Framework. The following datasets are currently integrated into SILK: DBpedia, GeoNames, FreeBase, British National Bibliography, Spanish National Library, Europeana, NSZL and MARC Code List. The rest of datasets – Library of Congress Subject Headings, lobid, VIAF and Open Library – are searched for links by a dedicated code that makes use of the API provided by them.
Linked Data Server. Its objective is to make the URIs of an ALIADA´s dataset dereferenceable. This means that when a URI of a resource is accessed from a Linked data Server, it returns its RDF description and not a 404 Not Found error instead. This way, it will also make it possible that the datasets in the LOD cloud are connected. This is carried out by configuring several URL rewrite rules in Virtuoso. In case an end-user accesses an URI of an ALIADA dataset via web browser, a dedicated web page of the specific resource is shown. If the publishing institution runs an OPAC - as it happens in the case of ALIADA’s partners ARTIUM and MFAB -, the user will be provided with the OPAC’s (Online Public Access Catalogue) page for the specific resource.
CKAN Datahub Page Creation. This component creates a page for the generated dataset in CKAN Datahub, by using the API provided by CKAN.