Skip to content

Searializing and parsing data

m0mo edited this page Dec 13, 2011 · 3 revisions

Supported formats of the ERP API

One of our objectives was to support a variety of import and export formats. For the prototype of the API we included parsers and serializes allowing users to import or export RDF based documents of the following formats:

  • RDF/XML
  • N-Triple
  • Turtle
  • RDF/JSON

These four formats are supported by the ERP API as parsers as well as serializes. All of these parsers and serializes implement the corresponding ISerializer or IParser interface. Therefore, we allow easy extendability of the packages or exchange of the implementation.

As mentioned, they don’t cover all available formats, but provide a usable foundation for the ERP API. Further, they are the most common ones used (see the comparison of APIs in chapter 4) by other APIs. Therefore, it is possible to import RDF documents created with other APIs to the ERP API. To parse or serialize a model, we only need one line of code as presented in code 5.5.

<?php
	require_once 'path/to/API.php';
	
        $model = ERP::getModel(); // create a new model using ERP function      
	
        // Parsing a file
        // $type is one of: rdf, ntriple, turtle or json
        $model->load($filename, $type);
        
        // process model ...
        
        // Serializing to a file
        // $type is one of: rdf, ntriple, turtle or json
        $model->save($filename, $type);
?>

Both the $model->load($filename, $type); and the $model->save($filename, $type); have two parameters. The parameter $filename simply defines the name of the file for loading or saving the model. The second parameter, $type, is more interesting. By default, the variable $type is set to rdf, which stands for the RDF/XML format.

To further illustrate the usage of parsers and serializers, we want to present the output of our serializes using the same example as for comparing the APIs. In summary, we created a model with one student identified by a matriculation number. Further, information about the birthday, name and the inscribed studies are added. For a specific study we added the studies code as well as an english and a german title. The creation of this model (using the resource-centric approach) is shown in code 5.6.

<?php
       require_once 'path/to/API.php';

       $model = ERP::getModel();
       $model->addBaseNamespace("ex", "http://example.org/");
        
       $res = $model->newResource("e0625287")
	->addProperty($model->newResource("firstName"),
		new LiteralNode("Alexander", STRING))
	->addProperty($model->newResource("lastName"), 
		new LiteralNode("Aigner", STRING))
	->addProperty($model->newResource("birthday"), 
		new LiteralNode("1986-04-28", DATE))
	->addProperty($model->newResource("studies"), 
	 	$model->newResource("businessInf")
			->addProperty($model->newResource("titleEN"),
			new LiteralNode("Business Informatics", STRING, "en"))
			->addProperty($model->newResource("titleDE"), 
			new LiteralNode("Wirtschaftsinformatik", STRING, "de"))
			->addProperty($model->newResource("studyCode"), 
			new LiteralNode("E 066 925"))
		);
        
        $model->add($res);
?>

RDF/XML format

The first format we want to discuss is the RDF/XML format. RDF/XML is probably the most important format for serializing a RDF graph. We implemented this format by using the XML functions that are already provided by PHP. Therefore, we provide increased speed and a solid foundation for easy extendability. The ERP API is the only PHP API using this way of implementation for the XML parser and serializer. For improving the parsers speed we took advantage of the XPath query engine. XPath can be described as a query language for querying XML based documents. Using XPath, it is not necessary to process every line of the document, since we can jump directly to the nodes that are important for us. More information on XPath can be found at [55].

Using the command $model->save($filename); we serialize (save) the model into a file, identified by the variable $filename. Since RDF/XML is the default format, we don’t have to pass the $type variable to the save function. For all serializers counts that a new file is created or overwritten if it already exists. This means that the ERP API (like all other APIs that are used for writing files) need to have permissions to create files on the local system. The content of the created file of our example model is shown in code 5.7.

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
		xmlns:ex="http://example.org/">		
	<rdf:Description rdf:about="http://example.org/businessInf">
		<ex:titleEN rdf:datatype="xmlns:string" xml:lang="en">
			Business Informatics
		</ex:titleEN>
		<ex:titleDE rdf:datatype="xmlns:string" xml:lang="de">
			Wirtschaftsinformatik
		</ex:titleDE>
		<ex:studyCode rdf:datatype="xmlns:string">
			E 066 925
		</ex:studyCode>
	</rdf:Description>
	<rdf:Description rdf:about="http://example.org/e0625287">
		<ex:firstName rdf:datatype="xmlns:string">
			Alexander
		</ex:firstName>
		<ex:lastName rdf:datatype="xmlns:#string">
			Aigner
		</ex:lastName>
		<ex:birthday rdf:datatype="xmlns:date">	
			1986-04-28
		</ex:birthday>
		<ex:studies rdf:resource="http://example.org/businessInf"/>
	</rdf:Description>
</rdf:RDF>

N-Triple format

N-Triple and Turtle are related formats, as Turtle is a superset of N-Triple. Both parsers and serializers are implemented by using the PHP’s file writer and reader functions.

The N-Triple format can be seen as a list of statements. Every line contains a string with three parts: the subject, predicate and object. These three sub-strings are separated by a whitespace. A dot on the end of the line indicates that the statement is complete.

URIs are represented by simply enclosing them in angle brackets, for example, http://example.org/e0625287. Literals are enclosed by quotation marks (for exam- ple, "Alexander"). The datatype of the literal is added by separating the literal value by two circumflexes and the string representation of the datatype enclosed in angle brack- ets (for example, "Alexander"ˆˆ). The language part is identified by an "at sign" (@) and a language tag identified by two characters (for example, "Business Informat- ics"@en) [56, 57].

To include the output of the N-Triple serializer within this work, we had to re-format the code by adding a line break before the object string. Unfortunately, this was neces- sary to be able to include the code in this document. By default, the strings representing the subject, predicate and object are printed in one line. The (modified) content of the produced output file, using the command $model->save($filename, "nt");, is presented in code 5.8.

<http://example.org/e0625287> <http://example.org/firstName> "Alexander"^^<string> .
<http://example.org/e0625287> <http://example.org/lastName> "Aigner"^^<string> .
<http://example.org/e0625287> <http://example.org/birthday> "1986-04-28"^^<date> .
<http://example.org/businessInf> <http://example.org/titleEN> "Business Informatics"@en^^<string> .
<http://example.org/businessInf> <http://example.org/titleDE> "Wirtschaftsinformatik"@de^^<string> .
<http://example.org/businessInf> <http://example.org/studyCode> "E 066 925"^^<string> .
<http://example.org/e0625287> <http://example.org/studies> http://example.org/businessInf> .

Turtle format

As mentioned prior, Turtle is a superset of N-Triple. It extends N-Triple by adding the support of using prefixes. Prefixes are defined on the beginning of each document by using the term @prefix, for example, @prefix ex:http://example.org/. The de- fined prefix can be used to abbreviate the full namespace (like in XML) and, therefore, shorten the output code. Generally, Turtle has the same syntax as N-Triple. It also rep- resents a RDF model as a list of statements, separating subject, predicate and object by whitespaces and a dot on the end of each line [62].

Since we can use prefixes, there is a difference in the representation of URIs. As mentioned, they are abbreviated using the pre-defined prefixes. Further, if using a pre- fix, they don’t need to be enclosed within angle brackets. Anyway, if an URI does not fit a pre-defined prefix, it is saved using the N-Triple notation (full URI within angle brackets) [62]. To pursue our example, the command $model->save($filename, "tur- tle"); creates a turtle document, which’s content is presented in code 5.9.

@prefix ex:<http://example.org/> .
ex:e0625287 ex:firstName "Alexander"^^<string> .
ex:e0625287 ex:lastName "Aigner"^^<string> .
ex:e0625287 ex:birthday "1986-04-28"^^<date> .
ex:businessInf ex:titleEN "Business Informatics"@en^^<string> .
ex:businessInf ex:titleDE "Wirtschaftsinformatik"@de^^<string> .
ex:businessInf ex:studyCode "E 066 925"^^<string> .
ex:e0625287 ex:studies ex:businessInf .

RDF/JSON format

The JavaScript Object Notation (JSON) is another important format. JSON’s primary use is to transmit data between a server and Web application, serving as an alternative to XML [12, 14]. JSON is built on two structures [12, 14]:

  1. A collection of name/value pairs. This is often realized as, for example, an object.
  2. An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

While the ERP serializer is implemented as string serialization, the ERP parser uses the build in PHP JSON decoder. Therefore, we achieve increased speed and provide further reliability. Unfortunately, since the code is very space taking, we had to shorten the output code of our JSON serializer. However, the concept of JSON should be still understandable. The shortened RDF/JSON output, for our example, is presented in code 5.10.

As we can see in the JSON output, the ERP serializer also provides character escap- ing for the URIs and literals.

{	"http:\/\/example.org\/businessInf":
	{	
		"http:\/\/example.org\/studyCode":[
			{	
				"value":"E 066 925",
				"type":"literal",
				"datatype":"string"
			}],
		"http:\/\/example.org\/titleEN":[
			{	
				"value":"Business Informatics",
				"type":"literal",
				"datatype":"string",
				"language":"en"
			}],
		...
	},
	"http:\/\/example.org\/e0625287":
	{	
		"http:\/\/example.org\/birthday":[
			{	
				"value":"1986-04-28",
				"type":"literal",
				"datatype":"date"
			}],
		"http:\/\/example.org\/firstname":[
			{	
				"value":"Alexander",
				"type":"literal",
				"datatype":"string"
			}]
		...,
		"http:\/\/example.org\/studies":[
			{	
				"value":"http:\/\/example.org\/businessInf",
				"type":"uri"
			}]
	}
}