Skip to content

Commit

Permalink
Merge pull request #420 from OpenDataServices/419-wkt
Browse files Browse the repository at this point in the history
 flatten and unflatten: WKT <-> geojson conversion
  • Loading branch information
Bjwebb authored Jun 23, 2023
2 parents b13fc8f + ca48417 commit a3627ae
Show file tree
Hide file tree
Showing 30 changed files with 505 additions and 16 deletions.
5 changes: 2 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,10 @@ on: [push, pull_request]

jobs:
build:
# Need to use an older Ubuntu so Python 3.6 is available
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
strategy:
matrix:
python-version: [ '3.6', '3.7', '3.8', '3.9', '3.10', '3.11']
python-version: [ '3.7', '3.8', '3.9', '3.10', '3.11']
jsonref-version: ["==0.3", ">1"]
steps:
- uses: actions/checkout@v2
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -60,4 +60,6 @@ examples/titles-ref/flatten/actual
examples/titles-ref/flatten/actual.*
examples/titles-ref/unflatten/actual
examples/titles-ref/unflatten/actual.*
examples/wkt/*/actual
examples/wkt/*/actual.*
.~lock*
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,16 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased]

## [0.21.0] - 2023-06-23

### Added

- WKT <-> geojson conversion for flattening and unflattening, behind an optional flag https://github.com/OpenDataServices/flatten-tool/issues/419

### Removed

- We no longer support Python 3.6

## [0.20.1] - 2023-01-11

### Fixed
Expand Down
2 changes: 2 additions & 0 deletions examples/help/flatten/expected.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ usage: flatten-tool flatten [-h] [-s SCHEMA] [-f {csv,ods,xlsx,all}] [--xml]
[--disable-local-refs]
[--remove-empty-schema-columns]
[--line-terminator LINE_TERMINATOR]
[--convert-wkt]
input_name

positional arguments:
Expand Down Expand Up @@ -65,3 +66,4 @@ optional arguments:
--line-terminator LINE_TERMINATOR
The line terminator to use when writing CSV files:
CRLF or LF
--convert-wkt Enable conversion of geojson to WKT
3 changes: 2 additions & 1 deletion examples/help/unflatten/expected.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ usage: flatten-tool unflatten [-h] -f {csv,ods,xlsx} [--xml]
[--xml-schema [XML_SCHEMA [XML_SCHEMA ...]]]
[--default-configuration DEFAULT_CONFIGURATION]
[--root-is-list] [--disable-local-refs]
[--xml-comment XML_COMMENT]
[--xml-comment XML_COMMENT] [--convert-wkt]
input_name

positional arguments:
Expand Down Expand Up @@ -75,3 +75,4 @@ optional arguments:
--disable-local-refs Disable local refs when parsing JSON Schema.
--xml-comment XML_COMMENT
String comment of what generates the xml file
--convert-wkt Enable conversion of WKT to geojson
1 change: 1 addition & 0 deletions examples/wkt/flatten/cmd.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
$ flatten-tool flatten -f=csv examples/wkt/flatten/data.json -o examples/wkt/flatten/actual --convert-wkt
10 changes: 10 additions & 0 deletions examples/wkt/flatten/data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"main": [
{
"geo": {
"type": "Point",
"coordinates": [53.486434, -2.239353]
}
}
]
}
2 changes: 2 additions & 0 deletions examples/wkt/flatten/expected/main.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
geo
POINT (53.486434 -2.239353)
1 change: 1 addition & 0 deletions examples/wkt/flatten_wkt_disabled/cmd.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
$ flatten-tool flatten -f=csv examples/wkt/flatten_wkt_disabled/data.json -o examples/wkt/flatten_wkt_disabled/actual
10 changes: 10 additions & 0 deletions examples/wkt/flatten_wkt_disabled/data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"main": [
{
"geo": {
"type": "Point",
"coordinates": [53.486434, -2.239353]
}
}
]
}
2 changes: 2 additions & 0 deletions examples/wkt/flatten_wkt_disabled/expected/main.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
geo/type,geo/coordinates
Point,53.486434;-2.239353
1 change: 1 addition & 0 deletions examples/wkt/unflatten/cmd.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
$ flatten-tool unflatten -f=csv examples/wkt/unflatten/ --schema examples/wkt/unflatten/schema.json --convert-wkt
13 changes: 13 additions & 0 deletions examples/wkt/unflatten/expected.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"main": [
{
"geo": {
"type": "Point",
"coordinates": [
53.486434,
-2.239353
]
}
}
]
}
2 changes: 2 additions & 0 deletions examples/wkt/unflatten/main.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
geo
POINT (53.486434 -2.239353)
11 changes: 11 additions & 0 deletions examples/wkt/unflatten/schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"properties": {
"geo": {
"type": "object",
"properties": {
"type": {},
"coordinates": {}
}
}
}
}
1 change: 1 addition & 0 deletions examples/wkt/unflatten_wkt_disabled/cmd.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
$ flatten-tool unflatten -f=csv examples/wkt/unflatten_wkt_disabled/ --schema examples/wkt/unflatten_wkt_disabled/schema.json
14 changes: 14 additions & 0 deletions examples/wkt/unflatten_wkt_disabled/expected.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"main": [
{
"geo_string": "POINT (53.486434 -2.239353)",
"geo": {
"type": "Point",
"coordinates": [
53.486434,
-2.239353
]
}
}
]
}
2 changes: 2 additions & 0 deletions examples/wkt/unflatten_wkt_disabled/main.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
geo_string,geo/type,geo/coordinates
POINT (53.486434 -2.239353),Point,53.486434;-2.239353
19 changes: 19 additions & 0 deletions examples/wkt/unflatten_wkt_disabled/schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"properties": {
"geo": {
"type": "object",
"properties": {
"type": {},
"coordinates": {
"type": "array",
"items": {
"type": "number"
}
}
}
},
"geo_string": {
"type": "string"
}
}
}
11 changes: 11 additions & 0 deletions flattentool/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ def flatten(
remove_empty_schema_columns=False,
truncation_length=3,
line_terminator="CRLF",
convert_wkt=False,
**_,
):
"""
Expand All @@ -110,6 +111,8 @@ def flatten(
if line_terminator not in LINE_TERMINATORS.keys():
raise Exception(f"{line_terminator} is not a valid line terminator")

convert_flags = {"wkt": convert_wkt}

if schema:
schema_parser = SchemaParser(
schema_filename=schema,
Expand Down Expand Up @@ -139,6 +142,7 @@ def flatten(
remove_empty_schema_columns=remove_empty_schema_columns,
truncation_length=truncation_length,
persist=True,
convert_flags=convert_flags,
) as parser:

def spreadsheet_output(spreadsheet_output_class, name):
Expand Down Expand Up @@ -218,19 +222,23 @@ def unflatten(
disable_local_refs=False,
xml_comment=None,
truncation_length=3,
convert_wkt=False,
**_,
):
"""
Unflatten a flat structure (spreadsheet - csv or xlsx) into a nested structure (JSON).
"""

if input_format is None:
raise Exception("You must specify an input format (may autodetect in future")
elif input_format not in INPUT_FORMATS:
raise Exception("The requested format is not available")
if metatab_name and base_json:
raise Exception("Not allowed to use base_json with metatab")

convert_flags = {"wkt": convert_wkt}

if root_is_list:
base = None
elif base_json:
Expand Down Expand Up @@ -258,6 +266,7 @@ def unflatten(
id_name=id_name,
xml=xml,
use_configuration=False,
convert_flags=convert_flags,
)
if metatab_schema:
parser = SchemaParser(
Expand Down Expand Up @@ -309,6 +318,7 @@ def unflatten(
id_name=id_name,
xml=xml,
base_configuration=base_configuration,
convert_flags=convert_flags,
)
if schema:
parser = SchemaParser(
Expand All @@ -317,6 +327,7 @@ def unflatten(
root_id=root_id,
disable_local_refs=disable_local_refs,
truncation_length=truncation_length,
convert_flags=convert_flags,
)
parser.parse()
spreadsheet_input.parser = parser
Expand Down
10 changes: 10 additions & 0 deletions flattentool/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,11 @@ def create_parser():
"--line-terminator",
help="The line terminator to use when writing CSV files: CRLF or LF",
)
parser_flatten.add_argument(
"--convert-wkt",
action="store_true",
help="Enable conversion of geojson to WKT",
)
parser_unflatten = subparsers.add_parser(
"unflatten", help="Unflatten a spreadsheet"
)
Expand Down Expand Up @@ -294,6 +299,11 @@ def create_parser():
default="XML generated by flatten-tool",
help="String comment of what generates the xml file",
)
parser_unflatten.add_argument(
"--convert-wkt",
action="store_true",
help="Enable conversion of WKT to geojson",
)

return parser

Expand Down
34 changes: 29 additions & 5 deletions flattentool/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,10 @@
from decimal import Decimal, InvalidOperation
from warnings import warn

import geojson
import openpyxl
import pytz
import shapely.wkt
from openpyxl.utils.cell import _get_column_letter

from flattentool.exceptions import DataErrorWarning
Expand All @@ -35,7 +37,7 @@ def __init__(self, cell_value, cell_location):
self.sub_cells = []


def convert_type(type_string, value, timezone=pytz.timezone("UTC")):
def convert_type(type_string, value, timezone=pytz.timezone("UTC"), convert_flags={}):
if value == "" or value is None:
return None
if type_string == "number":
Expand Down Expand Up @@ -103,6 +105,19 @@ def convert_type(type_string, value, timezone=pytz.timezone("UTC")):
if type(value) == datetime.datetime:
return value.date().isoformat()
return str(value)
elif convert_flags.get("wkt") and type_string == "geojson":
try:
geom = shapely.wkt.loads(value)
except shapely.errors.GEOSException as e:
warn(
_(
'An invalid WKT string was supplied "{value}", the message from the parser was: {parser_msg}'
).format(value=value, parser_msg=str(e)),
DataErrorWarning,
)
return
feature = geojson.Feature(geometry=geom, properties={})
return feature.geometry
elif type_string == "":
if type(value) == datetime.datetime:
return timezone.localize(value).isoformat()
Expand Down Expand Up @@ -258,6 +273,7 @@ def __init__(
xml=False,
base_configuration={},
use_configuration=True,
convert_flags={},
):
self.input_name = input_name
self.root_list_path = root_list_path
Expand All @@ -275,6 +291,7 @@ def __init__(
self.base_configuration = base_configuration or {}
self.sheet_configuration = {}
self.use_configuration = use_configuration
self.convert_flags = convert_flags

def get_sub_sheets_lines(self):
for sub_sheet_name in self.sub_sheet_names:
Expand Down Expand Up @@ -405,7 +422,12 @@ def do_unflatten(self):
(sheet_name, _get_column_letter(k + 1), j + 2, heading),
)
unflattened = unflatten_main_with_parser(
self.parser, cells, self.timezone, self.xml, self.id_name
self.parser,
cells,
self.timezone,
self.xml,
self.id_name,
self.convert_flags,
)
if root_id_or_none not in main_sheet_by_ocid:
main_sheet_by_ocid[root_id_or_none] = TemporaryDict(
Expand Down Expand Up @@ -922,7 +944,7 @@ def list_as_dicts_to_temporary_dicts(unflattened, id_name, xml):
return unflattened


def unflatten_main_with_parser(parser, line, timezone, xml, id_name):
def unflatten_main_with_parser(parser, line, timezone, xml, id_name, convert_flags={}):
unflattened = OrderedDict()
for path, cell in line.items():
# Skip blank cells
Expand Down Expand Up @@ -1041,9 +1063,11 @@ def unflatten_main_with_parser(parser, line, timezone, xml, id_name):
# However the type of the text value itself should not be "array",
# as that would split the text on commas, which we don't want.
# https://github.com/OpenDataServices/cove/issues/1030
converted_value = convert_type("", value, timezone)
converted_value = convert_type("", value, timezone, convert_flags)
else:
converted_value = convert_type(current_type or "", value, timezone)
converted_value = convert_type(
current_type or "", value, timezone, convert_flags
)
cell.cell_value = converted_value
if converted_value is not None and converted_value != "":
if xml:
Expand Down
Loading

0 comments on commit a3627ae

Please sign in to comment.