SKOS importer doesn't like special characters #346

mgbeyer · 2015-07-08T07:53:42Z

If the subject part of an N-Triple line contains characters like slash (/) or hash (#) the importer will reject them (example: "WARN -- : SkosImporter: Invalid origin. Skipping :concept/#Abbreviations rdf:type skos:concept").
But characters like / or # are normal parts of an URI. For example one of our thesauri we'd like to import to iQvoc contains multiple levels beyond the context path set by the default namespace to distinguish between actual concepts and personal classes and properties (among others). Then if you strip the leading default namespace from a subject string (like the importer does) the remaining part of the URI still contains slashes and will be rejected by the importer.

Generally an URI should be granted to contain UTF-8 conform special characters to allow for regional character sets.
So I wonder why the importer actively rejects characters beyond the minimal set of " a-zA-Z0-9_.-"? Was it a deliberate design decision with a sound purpose and I'm missing a point here? If you maybe could elaborate on that a little I would greatly appreciate it.

mjansing · 2015-07-16T09:01:49Z

I can't reproduce the problem. Please provide more information about the imported triples. The fragment identifier should be the last part of an uri (after filename, your leading slash looks a bit curious).

mgbeyer · 2015-07-16T10:01:06Z

Thanks for the reply!

I don't know what you mean by "after filename"...what filename?
Anyway, here's more detailed information about what we're trying to import (sorry this is a bit lengthy :))

The (stripped-down) N-Triples file:

<http://lod.gesis.org/thesoz/classification/0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
<http://lod.gesis.org/thesoz/classification/0> <http://www.w3.org/2004/02/skos/core#inScheme> <http://lod.gesis.org/thesoz/> .
<http://lod.gesis.org/thesoz/classification/0> <http://www.w3.org/2004/02/skos/core#prefLabel> "Grundlagen der Sozialwissenschaften\u00A00"@de .
<http://lod.gesis.org/thesoz/classification/0> <http://www.w3.org/2004/02/skos/core#prefLabel> "Fundamentals of the Social Sciences\u00A00"@en .
<http://lod.gesis.org/thesoz/classification/0> <http://www.w3.org/2004/02/skos/core#prefLabel> "'fondements des sciences sociales\u00A00"@fr .
<http://lod.gesis.org/thesoz/classification/0> <http://www.w3.org/2004/02/skos/core#notation> "0"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://lod.gesis.org/thesoz/classification/1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
<http://lod.gesis.org/thesoz/classification/1> <http://www.w3.org/2004/02/skos/core#inScheme> <http://lod.gesis.org/thesoz/> .
<http://lod.gesis.org/thesoz/classification/1> <http://www.w3.org/2004/02/skos/core#prefLabel> "Grundlagen der Sozialwissenschaften\u00A00"@de .
<http://lod.gesis.org/thesoz/classification/1> <http://www.w3.org/2004/02/skos/core#prefLabel> "Fundamentals of the Social Sciences\u00A00"@en .
<http://lod.gesis.org/thesoz/classification/1> <http://www.w3.org/2004/02/skos/core#prefLabel> "'fondements des sciences sociales\u00A00"@fr .
<http://lod.gesis.org/thesoz/classification/1> <http://www.w3.org/2004/02/skos/core#notation> "0"^^<http://www.w3.org/2001/XMLSchema#string> .

What seems to be the problem

We're using NAMESPACE='http://lod.gesis.org/thesoz/' as the default, so the remaining subjects will still contain a slash (like "classification/0").
I'm aware that if we expand the namespace to "http://lod.gesis.org/thesoz/classification/" we're facing subjects, starting with a number, which is also not approved by the importer for reasons unclear (see the validator method in the Origin class (/app/aides/origin.rb)). So basically we're talking about this code-fragment in the validator method of the Origin class:

    # should not start with a number
    valid = false if initial_value.match(/^\d.*/)

    # should not contain special chars
    valid = false if CGI.escape(initial_value) != initial_value

Ok, now here's the output:

I, [2015-07-16T11:44:58.282643 #14596]  INFO -- : Known namespaces:
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    1: skos: => http://www.w3.org/2004/02/skos/core#
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    2: skos: => http://www.w3.org/2008/05/skos#
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    3: rdf: => http://www.w3.org/1999/02/22-rdf-syntax-ns#
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    4: : => http://lod.gesis.org/thesoz/
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    5: rdfs: => http://www.w3.org/2000/01/rdf-schema#
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    6: owl: => http://www.w3.org/2002/07/owl#
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    7: dct: => http://purl.org/dc/terms/
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    8: foaf: => http://xmlns.com/foaf/spec/
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    9: void: => http://rdfs.org/ns/void#
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    10: iqvoc: => http://try.iqvoc.net/schema#
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- : Known first level classes:
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    1: skos:Concept => Concept::SKOS::Base
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    2: skos:Collection => Collection::SKOS::Unordered
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- : Known second level classes:
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    1: skos:prefLabel => Labeling::SKOS::PrefLabel
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    2: skos:altLabel => Labeling::SKOS::AltLabel
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    3: skos:changeNote => Note::SKOS::ChangeNote
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    4: skos:definition => Note::SKOS::Definition
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    5: skos:editorialNote => Note::SKOS::EditorialNote
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    6: skos:example => Note::SKOS::Example
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    7: skos:historyNote => Note::SKOS::HistoryNote
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    8: skos:scopeNote => Note::SKOS::ScopeNote
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    9: skos:related => Concept::Relation::SKOS::Related
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    10: skos:broader => Concept::Relation::SKOS::Broader::Mono
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    11: skos:narrower => Concept::Relation::SKOS::Narrower::Base
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    12: skos:closeMatch => Match::SKOS::CloseMatch
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    13: skos:exactMatch => Match::SKOS::ExactMatch
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    14: skos:relatedMatch => Match::SKOS::RelatedMatch
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    15: skos:broadMatch => Match::SKOS::BroadMatch
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    16: skos:narrowMatch => Match::SKOS::NarrowMatch
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    17: skos:notation => Notation::Base
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    18: skos:topConceptOf => Concept::SKOS::Scheme
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    19: skos:member => Collection::Member::SKOS::Base
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- : default namespace: 'http://lod.gesis.org/thesoz/'
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- : publish: 'true'
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- : SkosImporter: Importing triples...
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/0 rdf:type skos:Concept
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/0 skos:inScheme :
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/0 skos:prefLabel "Grundlagen der Sozialwissenschaften\u00A00"@de
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/0 skos:prefLabel "Fundamentals of the Social Sciences\u00A00"@en
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/0 skos:prefLabel "'fondements des sciences sociales\u00A00"@fr
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/0 skos:notation "0"^^<http://www.w3.org/2001/XMLSchema#string>
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/1 rdf:type skos:Concept
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/1 skos:inScheme :
W, [2015-07-16T11:44:58.302643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/1 skos:prefLabel "Grundlagen der Sozialwissenschaften\u00A00"@de
W, [2015-07-16T11:44:58.302643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/1 skos:prefLabel "Fundamentals of the Social Sciences\u00A00"@en
W, [2015-07-16T11:44:58.302643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/1 skos:prefLabel "'fondements des sciences sociales\u00A00"@fr
W, [2015-07-16T11:44:58.302643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/1 skos:notation "0"^^<http://www.w3.org/2001/XMLSchema#string>
I, [2015-07-16T11:44:58.302643 #14596]  INFO -- : Computing 'forward' defined triples...
I, [2015-07-16T11:44:58.302643 #14596]  INFO -- : Basic import done (took 0 seconds).
I, [2015-07-16T11:44:58.302643 #14596]  INFO -- : Publishing 0 new subjects...
I, [2015-07-16T11:44:58.302643 #14596]  INFO -- : Publishing of 0 subjects done (took 0 seconds). 0 are in draft state.
I, [2015-07-16T11:44:58.302643 #14596]  INFO -- : Imported 0 published and 0 draft subjects in 0 seconds.
I, [2015-07-16T11:44:58.302643 #14596]  INFO -- : First step took 0 seconds, publishing took 0 seconds.

As I said: lengthy as hell, sorry :-) But I guess it'll help to clarify the problem...

mjansing · 2015-07-16T14:13:48Z

Thanks. I updated your comment with some formatting options. I'll check that.

mjansing · 2015-07-16T15:11:51Z

BTW

...we're facing subjects, starting with a number, which is also not approved by the importer for reasons unclear...

Origins should not start with a number so that iQvoc is able to generate a valid rdf/xml serialization. See RDF syntax grammar for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SKOS importer doesn't like special characters #346

SKOS importer doesn't like special characters #346

mgbeyer commented Jul 8, 2015

mjansing commented Jul 16, 2015

mgbeyer commented Jul 16, 2015

mjansing commented Jul 16, 2015

mjansing commented Jul 16, 2015

SKOS importer doesn't like special characters #346

SKOS importer doesn't like special characters #346

Comments

mgbeyer commented Jul 8, 2015

mjansing commented Jul 16, 2015

mgbeyer commented Jul 16, 2015

The (stripped-down) N-Triples file:

What seems to be the problem

Ok, now here's the output:

mjansing commented Jul 16, 2015

mjansing commented Jul 16, 2015