Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New data frontend strategy [map]: lightweight data layers for every entry point with location component (focus on non-binary static files) #48

Open
fititnt opened this issue Jul 26, 2022 · 1 comment
Labels
librarium-formato librārium fōrmātō; /library format/@eng-Latn; Related to storage of entire referential data

Comments

@fititnt
Copy link
Member

fititnt commented Jul 26, 2022


Fact: data exchange often have some location component. It might not be easy to relate prepare data, and may not be the main focus of what user want, but is possible to not only key in data by the used numerical taxonomy Numerordĭnātĭo, but by location.

The idea here would be, in addition to the tabular formats which both can work as plain CSVs (but also via frictionless can be loaded into databases, as per #37) is something which could be loaded on tools that typically would use map. I think that some tools that work with graph (maybe plugins for Protege) would already do something, but this would be on #41.

Challenges

(Likely major issue at standards level) interlink data related to location without replicate geometry on every file

I might be wrong, but all I'm seeing (maybe because most GIS tools are strongly focused on Desktop and strong numerical precision) that they tend to allow attach data to administrative locations, BUT... the way most of then do it is duplicating the geometries when related data comes from several sources!

Interoperability to change location geometry references and data for same topic easily

Assuming the issue of allow (at client-level, likely these desktop programs or, with some documentation, web interfaces) to optionally not duplicate geometries every time static file contains related data, we come to the second point: allow end users change both.

There's reasons (both for changes on precision of geometries, or maybe because new data might have small variation) that users might not using exact same geometry, yet still relevant to interlink the data.

Potential first approach

Unless we resort to XML files (or would use geopackage, but this is binary format, not really what we want for things that need to allow user change parts) one close alternative would be... Geojson.

GeoJSON, GeoJSON-LD, ...

GeoJSON is by far the most well supported non XML format / non tabular format. The main complain (and is the fact TopoJSON was created) is that tend to have higher file size and take more memory from the user than binary formats.

However, while we might test if can work in practice, I think we can at least start generating geojson and mark the properties associated with each feature in some way that it could be understand as stricter RDF (not mere text). We already been doing this, so it would make it on a JSON-like format.

Maybe create "dummy" points and mark the real geometries with extensions to GeoJSON

GeoJSON itself does not allow to reuse geometries from other places (but we might use something based on JSON Schema or RDF to signal this), but we could at least, for clients that would not be able to understand, create dummy points, like the centroind of an administrative area.

One advantage of this approach is that GeoJSON with only single points would take very low extra weight, since most of the file size would actually be what user really want as metadata. This extra heigth also would be such that we would likely not have relevant benefit of using topojson (which I think mostly use arcs to simplify things, but no metadata is changed).

Strategies on how the "dummy" points could be replaced:

  • Either command like instruction (which could be generated as part of user documentation) or online tools could contatenate both the real geometries (most likely used for geometries which would be wasteful to repeat on every dataset, such as administrative regions).
    • This approach also allows for the benefict of users know how to merge several related data from different subjects on a "final" file.
  • At client-side, tools be aware of the exchanged data
    • Web interfaces this would means implement this (like with javascript or something)
      • One obvious advantage (even for tools that would allow import several geojson layers) is much lower memory usage
    • Desktop tools learn how to interlink the files
      • Not something sort term (but could make sense in long run) if already have considerable amount of data

Not focus here

On a quick look, it seems that there's very complex and detailed all-in-one servers, like open source geonode https://geonode.org/ or MapServer https://www.mapserver.org/, which would allow to deploy pretty much everything. The analogy would be a CKAN, but strongly focused on maps. They do use documented protocols, but unless we find ways to make very simplistic automated generation of static files to emulate then, is out of scope we try to create production-level server just as frontend for the data.

We can, however, automate or document how to ingest data. But at this point the start here is just make things work at client-side and server simply have static files in predictable ways. This is why we cannot rely too much on a public data warehouse for everyone.

@fititnt fititnt added the librarium-formato librārium fōrmātō; /library format/@eng-Latn; Related to storage of entire referential data label Jul 26, 2022
@fititnt
Copy link
Member Author

fititnt commented Jul 26, 2022

Humm... it might be feasible. Since the GeoJSON not explicit forbidden unknown keywords, we can explain the GeoJSON as if it was JSON-LD (which means allow RDF and all 5-star all the way).

On this example, it's not yet with thematic information not directly related to administrative boundaries (such as population or something). However, we're already adding some dummy point just to allow render. Not sure for now how to semantic signal that this point on this file in special is just dummy (but must have some way)

On not expect map front ends really need to understand full block RDF

Even if we manage to make the geojsons versions of data explain themselves using RDF, this is less because we expect simple implementations really make use of this, and much more to ensure strict validation eventually at data generation.

Also, the second most obvious advantage of allow machines understand these geojsons is... we can automate documentation generation in every natural language.

On the idea of avoiding save name of places in the geojsons with thematic data (for performance reasons)

For sake of file size, even if could make sense (and I'm not sure on this special case) to export the names of places on several languages for geometries that are direct about the places, I think that by default, most features exported on geojsons should focus on either computational attributes (things that people would want to plot graphs (like with https://github.com/ghtmtt/DataPlotly) than linguistic metadata.

Even if users would want to localize place, interfaces could fetch the languages on separate files.

The like hood of people loading huge amount of layers could easily overload front ends (in special if not desktop applications).

On the idea of allow merging data layers (but not try to simplify incompatible data)

about the "incompatible data" is this discussion: #47

While we can think ways to allow users to load a lot of layers, similar to what would happens on tabular format (but on that case, errors would occour at loading time at generating the entire database) in the case of data layers, I don't think we should try solve if user load incompatible layers which would reuse same variables.

We could use RDF to give mire hint about different packages (like different sources affirming same fact), but if some tool would try to "merge" all data in one on the easiest way to implement, the end behavior could be undefined.

Ok that in general, most GIS tools already allow to "select" layer by layer (so the user could deal with incompatible data) but the very advantage of what we're proposing here would be allow concatenating different meta information as if they could be single layer or something. Likely not possible do both things at same time while making easier because they would need human decide in detail how to solve conflicts.


Early draft

Content based on the example from https://geojson.org/geojson-ld/, except we use FeatureCollection and only add on @context the features will appear on the data.

$ ./999999999/0/999999999_54872.py --methodus=geojson --rdf-sine-spatia-nominalibus=devnull --rdf-trivio=5000 1603/16/1/0/1603_16_1_0.no1.tm.hxl.csv | jq

{
  "$schema": "https://geojson.org/schema/GeoJSON.json",
  "@context": {
    "@version": 1.1,
    "geojson": "https://purl.org/geojson/vocab#",
    "wdata": "http://www.wikidata.org/wiki/Special:EntityData/",
    "Feature": "geojson:Feature",
    "FeatureCollection": "geojson:FeatureCollection",
    "Point": "geojson:Point",
    "Polygon": "geojson:Polygon",
    "coordinates": {
      "@container": "@list",
      "@id": "geojson:coordinates"
    },
    "features": {
      "@container": "@set",
      "@id": "geojson:features"
    },
    "geometry": "geojson:geometry",
    "id": "@id",
    "properties": "geojson:properties",
    "type": "@type",
    "x-iso3166p1a2": "wdata:P297",
    "x-iso3166p1a3": "wdata:P298"
  },
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "id": "urn:mdciii:1603:16:24:0",
      "geometry": {
        "type": "Point",
        "coordinates": [
          17.35,
          -12.35
        ]
      },
      "properties": {
        "x-iso3166p1a2": "AO",
        "x-iso3166p1a3": "AGO"
      }
    },
    {
      "type": "Feature",
      "id": "urn:mdciii:1603:16:76:0",
      "geometry": {
        "type": "Point",
        "coordinates": [
          -53,
          -14
        ]
      },
      "properties": {
        "x-iso3166p1a2": "BR",
        "x-iso3166p1a3": "BRA"
      }
    }
  ]
}

Rendered result at https://geojson.io/

Captura de tela de 2022-07-26 09-28-47

Output at JSON-LD playground https://json-ld.org/playground/

Captura de tela de 2022-07-26 09-32-59

RDF/Turtle final result

Used https://www.easyrdf.org/converter

@prefix ns0: <https://purl.org/geojson/vocab#> .
@prefix ns1: <http://www.wikidata.org/wiki/Special:EntityData/> .

<urn:mdciii:1603:16:24:0>
  a <https://purl.org/geojson/vocab#Feature> ;
  ns0:geometry [
    a ns0:Point ;
    ns0:coordinates (
     1.735000e+1
     -1.235000e+1
   )
  ] ;
  ns0:properties [
    ns1:P297 "AO" ;
    ns1:P298 "AGO"
  ] .

<urn:mdciii:1603:16:76:0>
  a ns0:Feature ;
  ns0:geometry [
    a ns0:Point ;
    ns0:coordinates (
     -53
     -14
   )
  ] ;
  ns0:properties [
    ns1:P297 "BR" ;
    ns1:P298 "BRA"
  ] .

[]
  a ns0:FeatureCollection ;
  ns0:features <urn:mdciii:1603:16:24:0>, <urn:mdciii:1603:16:76:0> .

fititnt added a commit that referenced this issue Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
librarium-formato librārium fōrmātō; /library format/@eng-Latn; Related to storage of entire referential data
Projects
None yet
Development

No branches or pull requests

1 participant