Skip to content

Latest commit

 

History

History
63 lines (43 loc) · 4.67 KB

README.md

File metadata and controls

63 lines (43 loc) · 4.67 KB

SANSA OWL

Maven Central Build Status License Twitter

Description

SANSA OWL is a library to read OWL files into Spark or Flink. It allows files to reside in HDFS as well as in a local file system and distributes them across Spark RDDs/Datasets or Flink DataSets.

Package Structure

The package contains three modules:

  • sansa-owl-common which contains platform-independent, mostly parsing specific functionality
  • sansa-owl-spark which contains Spark-specific code
  • sansa-owl-flink which contains Flink-specific code

SANSA OWL Spark

SANSA OWL Spark package structure

SANSA OWL Spark mainly contains builder objects to read OWL files in different formats. Currently we support Functional Syntax, Manchester Syntax and OWL XML. Besides this we also work on building OWL axioms from other RDF formats like Turtle or N-Triples.

We support distributed representations of OWL files based on RDDs or Spark datasets. These can either contain string-based representations of single entities of the given format, e.g. single functional-style axiom descriptions like DisjointDataProperties(bar:dataProp1 bar:dataProp2) or whole Manchester Syntax frames like

ObjectProperty: bar:prop

    Characteristics:
        Asymmetric

or parsed OWL API axiom objects. We call these intermediate string-based entities 'expressions' and the corresponding distributed data structures 'expressions RDDs' or 'expressions datasets'. The final data structures holding OWL API axiom objects are called 'axiom RDDs' and 'axiom datasets', respectively.

SANSA OWL Flink

SANSA OWL Flink package structure

SANSA OWL Flink mainly contains builder objects to read OWL files in different formats. Currently we support Functional Syntax and Manchester Syntax. Parsing support for OWL XML is planned for future releases. Besides this we also work on building OWL axioms from other RDF formats like Turtle or N-Triples.

Distributed representations can either contain string-based representations of single entities of the given format, e.g. single functional-style axiom descriptions like DisjointDataProperties(bar:dataProp1 bar:dataProp2) or whole Manchester Syntax frames like

ObjectProperty: bar:prop

    Characteristics:
        Asymmetric

or parsed OWL API axiom objects. We call these intermediate string-based entities 'expressions' and the corresponding distributed data structure 'expressions dataset'. The final data structure holding OWL API axiom objects is called 'axiom dataset'.

Usage

The following Scala code shows how to read an OWL file in Functional Syntax (be it a local file or a file residing in HDFS) into a Spark RDD:

import net.sansa_stack.owl.spark.owl._

val syntax = Syntax.FUNCTIONAL
val input = "path/to/functional/syntax/file.owl"

val rdd = spark.owl(syntax)(input)

We also provide builder objects for the other described OWL formats and data structures. The same holds for the Flink implementations. An overview is given in the FAQ section of the SANSA project page. Further documentation about the builder objects can also be found on the ScalaDoc page.

How to Contribute

We always welcome new contributors to the project! Please see our contribution guide for more details on how to get started contributing to SANSA.