-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SKOS importer doesn't like special characters #346
Comments
I can't reproduce the problem. Please provide more information about the imported triples. The fragment identifier should be the last part of an uri (after filename, your leading slash looks a bit curious). |
Thanks for the reply! I don't know what you mean by "after filename"...what filename? The (stripped-down) N-Triples file:
What seems to be the problemWe're using NAMESPACE='http://lod.gesis.org/thesoz/' as the default, so the remaining subjects will still contain a slash (like "classification/0").
Ok, now here's the output:
As I said: lengthy as hell, sorry :-) But I guess it'll help to clarify the problem... |
Thanks. I updated your comment with some formatting options. I'll check that. |
BTW
Origins should not start with a number so that iQvoc is able to generate a valid rdf/xml serialization. See RDF syntax grammar for details. |
If the subject part of an N-Triple line contains characters like slash (/) or hash (#) the importer will reject them (example: "WARN -- : SkosImporter: Invalid origin. Skipping :concept/#Abbreviations rdf:type skos:concept").
But characters like / or # are normal parts of an URI. For example one of our thesauri we'd like to import to iQvoc contains multiple levels beyond the context path set by the default namespace to distinguish between actual concepts and personal classes and properties (among others). Then if you strip the leading default namespace from a subject string (like the importer does) the remaining part of the URI still contains slashes and will be rejected by the importer.
Generally an URI should be granted to contain UTF-8 conform special characters to allow for regional character sets.
So I wonder why the importer actively rejects characters beyond the minimal set of " a-zA-Z0-9_.-"? Was it a deliberate design decision with a sound purpose and I'm missing a point here? If you maybe could elaborate on that a little I would greatly appreciate it.
The text was updated successfully, but these errors were encountered: