New data frontend strategy [map]: lightweight data layers for every entry point with location component (focus on non-binary static files) #48

fititnt · 2022-07-26T08:03:56Z

Some data formats related to GIS
- https://gdal.org/index.html
GeoJSON Related
- https://datatracker.ietf.org/doc/html/rfc7946
- https://geojson.org/
  - https://geojson.org/geojson-ld/
    - Very relevant for our use cases (we're already mapping things to RDF)
A Uniform Resource Identifier for Geographic Locations ('geo' URI)
- https://datatracker.ietf.org/doc/html/rfc5870
- Frontends
  - http://osm.codes/ (example: http://osm.codes/geo:-23.550385,-46.633956)
Indirectly related:
- New data warehouse strategy [tabular]: SQL database populated with dictionaries data (experimental feature) #37
- New data warehouse strategy (graph): RDF/SPARQL graph database populated with dictionaries data (experimental feature) #41
Other links related to organize entire libraries of datasets by location
- https://github.com/SEMICeu/GeoDCAT-AP
- WFS https://en.wikipedia.org/wiki/Web_Feature_Service
- WCS https://en.wikipedia.org/wiki/Web_Coverage_Service
- WCPS https://en.wikipedia.org/wiki/Web_Coverage_Processing_Service
- (the list actually is very long)

Fact: data exchange often have some location component. It might not be easy to relate prepare data, and may not be the main focus of what user want, but is possible to not only key in data by the used numerical taxonomy Numerordĭnātĭo, but by location.

The idea here would be, in addition to the tabular formats which both can work as plain CSVs (but also via frictionless can be loaded into databases, as per #37) is something which could be loaded on tools that typically would use map. I think that some tools that work with graph (maybe plugins for Protege) would already do something, but this would be on #41.

Challenges

(Likely major issue at standards level) interlink data related to location without replicate geometry on every file

I might be wrong, but all I'm seeing (maybe because most GIS tools are strongly focused on Desktop and strong numerical precision) that they tend to allow attach data to administrative locations, BUT... the way most of then do it is duplicating the geometries when related data comes from several sources!

Interoperability to change location geometry references and data for same topic easily

Assuming the issue of allow (at client-level, likely these desktop programs or, with some documentation, web interfaces) to optionally not duplicate geometries every time static file contains related data, we come to the second point: allow end users change both.

There's reasons (both for changes on precision of geometries, or maybe because new data might have small variation) that users might not using exact same geometry, yet still relevant to interlink the data.

Potential first approach

Unless we resort to XML files (or would use geopackage, but this is binary format, not really what we want for things that need to allow user change parts) one close alternative would be... Geojson.

GeoJSON, GeoJSON-LD, ...

GeoJSON is by far the most well supported non XML format / non tabular format. The main complain (and is the fact TopoJSON was created) is that tend to have higher file size and take more memory from the user than binary formats.

However, while we might test if can work in practice, I think we can at least start generating geojson and mark the properties associated with each feature in some way that it could be understand as stricter RDF (not mere text). We already been doing this, so it would make it on a JSON-like format.

Maybe create "dummy" points and mark the real geometries with extensions to GeoJSON

GeoJSON itself does not allow to reuse geometries from other places (but we might use something based on JSON Schema or RDF to signal this), but we could at least, for clients that would not be able to understand, create dummy points, like the centroind of an administrative area.

One advantage of this approach is that GeoJSON with only single points would take very low extra weight, since most of the file size would actually be what user really want as metadata. This extra heigth also would be such that we would likely not have relevant benefit of using topojson (which I think mostly use arcs to simplify things, but no metadata is changed).

Strategies on how the "dummy" points could be replaced:

Either command like instruction (which could be generated as part of user documentation) or online tools could contatenate both the real geometries (most likely used for geometries which would be wasteful to repeat on every dataset, such as administrative regions).
- This approach also allows for the benefict of users know how to merge several related data from different subjects on a "final" file.
At client-side, tools be aware of the exchanged data
- Web interfaces this would means implement this (like with javascript or something)
  - One obvious advantage (even for tools that would allow import several geojson layers) is much lower memory usage
- Desktop tools learn how to interlink the files
  - Not something sort term (but could make sense in long run) if already have considerable amount of data

Not focus here

On a quick look, it seems that there's very complex and detailed all-in-one servers, like open source geonode https://geonode.org/ or MapServer https://www.mapserver.org/, which would allow to deploy pretty much everything. The analogy would be a CKAN, but strongly focused on maps. They do use documented protocols, but unless we find ways to make very simplistic automated generation of static files to emulate then, is out of scope we try to create production-level server just as frontend for the data.

We can, however, automate or document how to ingest data. But at this point the start here is just make things work at client-side and server simply have static files in predictable ways. This is why we cannot rely too much on a public data warehouse for everyone.

fititnt · 2022-07-26T13:04:18Z

Humm... it might be feasible. Since the GeoJSON not explicit forbidden unknown keywords, we can explain the GeoJSON as if it was JSON-LD (which means allow RDF and all 5-star all the way).

On this example, it's not yet with thematic information not directly related to administrative boundaries (such as population or something). However, we're already adding some dummy point just to allow render. Not sure for now how to semantic signal that this point on this file in special is just dummy (but must have some way)

On not expect map front ends really need to understand full block RDF

Even if we manage to make the geojsons versions of data explain themselves using RDF, this is less because we expect simple implementations really make use of this, and much more to ensure strict validation eventually at data generation.

Also, the second most obvious advantage of allow machines understand these geojsons is... we can automate documentation generation in every natural language.

On the idea of avoiding save name of places in the geojsons with thematic data (for performance reasons)

For sake of file size, even if could make sense (and I'm not sure on this special case) to export the names of places on several languages for geometries that are direct about the places, I think that by default, most features exported on geojsons should focus on either computational attributes (things that people would want to plot graphs (like with https://github.com/ghtmtt/DataPlotly) than linguistic metadata.

Even if users would want to localize place, interfaces could fetch the languages on separate files.

The like hood of people loading huge amount of layers could easily overload front ends (in special if not desktop applications).

On the idea of allow merging data layers (but not try to simplify incompatible data)

about the "incompatible data" is this discussion: #47

While we can think ways to allow users to load a lot of layers, similar to what would happens on tabular format (but on that case, errors would occour at loading time at generating the entire database) in the case of data layers, I don't think we should try solve if user load incompatible layers which would reuse same variables.

We could use RDF to give mire hint about different packages (like different sources affirming same fact), but if some tool would try to "merge" all data in one on the easiest way to implement, the end behavior could be undefined.

Ok that in general, most GIS tools already allow to "select" layer by layer (so the user could deal with incompatible data) but the very advantage of what we're proposing here would be allow concatenating different meta information as if they could be single layer or something. Likely not possible do both things at same time while making easier because they would need human decide in detail how to solve conflicts.

Early draft

Content based on the example from https://geojson.org/geojson-ld/, except we use FeatureCollection and only add on @context the features will appear on the data.

`$ ./999999999/0/999999999_54872.py --methodus=geojson --rdf-sine-spatia-nominalibus=devnull --rdf-trivio=5000 1603/16/1/0/1603_16_1_0.no1.tm.hxl.csv | jq`

{
  "$schema": "https://geojson.org/schema/GeoJSON.json",
  "@context": {
    "@version": 1.1,
    "geojson": "https://purl.org/geojson/vocab#",
    "wdata": "http://www.wikidata.org/wiki/Special:EntityData/",
    "Feature": "geojson:Feature",
    "FeatureCollection": "geojson:FeatureCollection",
    "Point": "geojson:Point",
    "Polygon": "geojson:Polygon",
    "coordinates": {
      "@container": "@list",
      "@id": "geojson:coordinates"
    },
    "features": {
      "@container": "@set",
      "@id": "geojson:features"
    },
    "geometry": "geojson:geometry",
    "id": "@id",
    "properties": "geojson:properties",
    "type": "@type",
    "x-iso3166p1a2": "wdata:P297",
    "x-iso3166p1a3": "wdata:P298"
  },
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "id": "urn:mdciii:1603:16:24:0",
      "geometry": {
        "type": "Point",
        "coordinates": [
          17.35,
          -12.35
        ]
      },
      "properties": {
        "x-iso3166p1a2": "AO",
        "x-iso3166p1a3": "AGO"
      }
    },
    {
      "type": "Feature",
      "id": "urn:mdciii:1603:16:76:0",
      "geometry": {
        "type": "Point",
        "coordinates": [
          -53,
          -14
        ]
      },
      "properties": {
        "x-iso3166p1a2": "BR",
        "x-iso3166p1a3": "BRA"
      }
    }
  ]
}

Rendered result at https://geojson.io/

Output at JSON-LD playground https://json-ld.org/playground/

RDF/Turtle final result

Used https://www.easyrdf.org/converter

@prefix ns0: <https://purl.org/geojson/vocab#> .
@prefix ns1: <http://www.wikidata.org/wiki/Special:EntityData/> .

<urn:mdciii:1603:16:24:0>
  a <https://purl.org/geojson/vocab#Feature> ;
  ns0:geometry [
    a ns0:Point ;
    ns0:coordinates (
     1.735000e+1
     -1.235000e+1
   )
  ] ;
  ns0:properties [
    ns1:P297 "AO" ;
    ns1:P298 "AGO"
  ] .

<urn:mdciii:1603:16:76:0>
  a ns0:Feature ;
  ns0:geometry [
    a ns0:Point ;
    ns0:coordinates (
     -53
     -14
   )
  ] ;
  ns0:properties [
    ns1:P297 "BR" ;
    ns1:P298 "BRA"
  ] .

[]
  a ns0:FeatureCollection ;
  ns0:features <urn:mdciii:1603:16:24:0>, <urn:mdciii:1603:16:76:0> .

…he (for test previews)

fititnt added the librarium-formato librārium fōrmātō; /library format/@eng-Latn; Related to storage of entire referential data label Jul 26, 2022

fititnt added a commit that referenced this issue Jul 27, 2022

999999999_54872.py (#48): --methodus=geojson bugfix; added to lsf-cac…

eb73414

…he (for test previews)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New data frontend strategy [map]: lightweight data layers for every entry point with location component (focus on non-binary static files) #48

New data frontend strategy [map]: lightweight data layers for every entry point with location component (focus on non-binary static files) #48

fititnt commented Jul 26, 2022

Not focus here

fititnt commented Jul 26, 2022

New data frontend strategy [map]: lightweight data layers for every entry point with location component (focus on non-binary static files) #48

New data frontend strategy [map]: lightweight data layers for every entry point with location component (focus on non-binary static files) #48

Comments

fititnt commented Jul 26, 2022

Challenges

(Likely major issue at standards level) interlink data related to location without replicate geometry on every file

Interoperability to change location geometry references and data for same topic easily

Potential first approach

GeoJSON, GeoJSON-LD, ...

Maybe create "dummy" points and mark the real geometries with extensions to GeoJSON

Not focus here

fititnt commented Jul 26, 2022

On not expect map front ends really need to understand full block RDF

On the idea of avoiding save name of places in the geojsons with thematic data (for performance reasons)

On the idea of allow merging data layers (but not try to simplify incompatible data)

Early draft

$ ./999999999/0/999999999_54872.py --methodus=geojson --rdf-sine-spatia-nominalibus=devnull --rdf-trivio=5000 1603/16/1/0/1603_16_1_0.no1.tm.hxl.csv | jq

Rendered result at https://geojson.io/

Output at JSON-LD playground https://json-ld.org/playground/

RDF/Turtle final result

`$ ./999999999/0/999999999_54872.py --methodus=geojson --rdf-sine-spatia-nominalibus=devnull --rdf-trivio=5000 1603/16/1/0/1603_16_1_0.no1.tm.hxl.csv | jq`