csvwlib
is a Python implementation of the W3C
CSV on the Web recommendations.
This enables converting tabular data, and optionally its associated metadata, to a semantic graph in RDF or JSON-LD format.
Tabular data includes CSV files, TSV files, and upstream may be coming from spreadsheets, RDBMS export, etc.
Requires Python 3.6 or later.
pip install csvwlib
The library exposes one class - CSVWConverter
which has methods to_json()
and to_rdf()
Both of these methods have similar API, and require 3+ parameters:
csv_url
- URL of a CSV file; defaultNone
metadata_url
- optional URL of a metadata file; defaultNone
mode
- conversion mode; defaultstandard
, orminimal
The are three ways of starting the conversion process:
-
pass only
csv_url
- corresponding metadata will be looked up based oncsv_url
as described in Locating Metadata -
pass both
csv_url
andmetadata_url
- metadata by user will be used. Ifurl
field is set in metadata, the CSV file will be retrieved from that location which can cause, that passedcsv_url
will be ignored -
pass only
metadata_url
- associated CSV files will be retrieved based on metadataurl
field
You can also specify the conversion mode - standard
or minimal
, the default is standard
.
From the W3C documentation:
Standard mode conversion frames the information gleaned from the cells of the tabular data with details of the rows, tables, and a group of tables within which that information is provided.
Minimal mode conversion includes only the information gleaned from the cells of the tabular data.
After conversion to JSON, you receive a dict
object, when converting to RDF it is more complex.
If you pass format
parameter, graph will be serialized to this format and returned as string.
From the rdflib
docs:
Format support can be extended with plugins, but "xml", "n3", "turtle", "nt", "pretty-xml", "trix", "trig" and "nquads" are built in.
If you don't specify the format, you will receive a rdflib.Graph
object.
Example data+metadata files can be found at http://w3c.github.io/csvw/tests/
Starting with CSV:
from csvwlib import CSVWConverter
CSVWConverter.to_rdf("http://w3c.github.io/csvw/tests/test001.csv", format="ttl")
Minimal mode:
from csvwlib import CSVWConverter
CSVWConverter.to_rdf("http://w3c.github.io/csvw/tests/tree-ops.csv", mode="minimal", format="ttl")
Starting with metadata only:
from csvwlib import CSVWConverter
CSVWConverter.to_rdf(metadata_url="http://w3c.github.io/csvw/tests/test188-metadata.json", format="ttl")
Both CSV and metadata URL specified:
from csvwlib import CSVWConverter
CSVWConverter.to_rdf("http://w3c.github.io/csvw/tests/tree-ops.csv", "http://w3c.github.io/csvw/tests/tree-ops.csv", format="ttl")
Starting with metadata:
from csvwlib import CSVWConverter
CSVWConverter.to_json("http://w3c.github.io/csvw/tests/countries.json")
Starting with CSV:
from csvwlib import CSVWConverter
CSVWConverter.to_json("http://w3c.github.io/csvw/tests/test001.csv")
Authored by @Aleksander-Drozd
Maintained by @DerwenAI