Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spatial unit classes #928

Merged
merged 144 commits into from
Jul 1, 2019
Merged

Add spatial unit classes #928

merged 144 commits into from
Jul 1, 2019

Conversation

jc-harrison
Copy link
Member

Closes #926

I have:

  • Formatted any Python files with black
  • Brought the branch up to date with master
  • Added any relevant Github labels
  • Added tests for any new additions
  • Added or updated any relevant documentation
  • Added an Architectural Decision Record (ADR), if appropriate
  • Added an MPLv2 License Header if appropriate
  • Updated the Changelog

Description

The aim of this PR is to encapsulate all parameters and logic handling related to spatial levels / aggregation units in a set of new classes.

In flowmachine/core/spatial_unit.py I have defined the following new classes:

  • SpatialUnitMixin: Provides a location_id_columns property (which replaces get_columns_for_level) and a location_subset_clause method which returns a SQL WHERE clause to subset a query to one or more locations (used by FirstLocation), along with methods for checking whether a spatial unit has certain criteria (e.g. represents a join to geographic data, or has longitude/latitude columns).
  • CellSpatialUnit: Represents "cell" level (i.e. where no location join is required).
  • GeomSpatialUnit: Inherits from Query. This is an abstract base class which represents a query that maps cell location IDs to geographic locations. Has a get_geom_query method which returns a SQL query to select the location_id_columns along with a geom column containing the geometry (this is used by GeoDataMixin and DistanceMatrix, among others).
  • LonLatSpatialUnit: Derived from GeomSpatialUnit - represents the point-like levels ("lat-lon", "versioned-site", "versioned-cell").
  • PolygonSpatialUnit: Derived from GeomSpatialUnit - represents the polygon levels ("polygon", "admin*", "grid").

spatial_unit.py also contains functions versioned_cell_spatial_unit, versioned_site_spatial_unit, admin_spatial_unit, grid_spatial_unit and make_spatial_unit, to help in creating spatial unit objects.

All classes which previously took a level parameter now accept a spatial_unit parameter instead, which should be a spatial unit object.

I have removed cell_mappings.py, as the CellTo* classes are superseded by PolygonSpatialUnit.

I've added a Geography class, which wraps the spatial unit get_geom_query method as a Query object with GeoDataMixin. This class is used by the FlowMachine server "get_geography" action.

Grid has moved from flowmachine/features/spatial to flowmachine/core so that it can be imported in spatial_unit.py.

Now that spatial units are themselves Query objects, these have replaced the CustomQuery/CellTo* objects which were created internally in JoinToLocation. I've also added a location_joined_query helper function which returns a JoinToLocation when a location join is required, or the original query object when no join is required (i.e. cell spatial unit), to avoid replicating this if/else logic in several other places.

I've removed the _get_location_join method of GeoDataMixin, which did a recursive search through the dependency tree to find a JoinToLocation object. Instead, any query which uses GeoDataMixin without redefining _geo_augmented_query must now have a spatial_unit attribute, whose get_geom_query method is used to add a "geom" column.

Previously, the _SubscriberCells class represented cell-level subscriber locations, and the subscriber_locations function returned either a _SubscriberCells or JoinToLocation object depending on the level. Now that I've added location_joined_query to handle this if/else case, I've combined _SubscriberCells and subscriber_locations into a SubscriberLocations class. Similarly, I've combined _TotalCellEvents into TotalLocationEvents.

I've moved the marshmallow AggregationUnit custom field out of query_schemas/custom_fields.py into query_schemas/aggregation_unit.py, and added a function get_spatial_unit_obj for getting a spatial unit object from an aggregation_unit string.

A couple of related thoughts which I think fall outside the scope of this PR:

  • We should decide whether we want to extend DistanceMatrix to non-point locations. If so, this will hopefully be an easier task after this refactoring. If not, and we want DistanceMatrix to continue to support only versioned-cell and versioned-site spatial units, it may make more sense to calculate and store a distance matrix permanently in FlowDB.
  • It would be useful in some cases to be able to create spatial unit objects without a DB connection. This links in with Should be able to instantiate Query without a connection #390
  • Geography is currently exposed as a query kind through the FlowAPI run/poll/get route, as well as geography. This simplifies adding permissions for the geography route, but doesn't make a lot of sense for users because the get route won't format the response as GeoJSON bere returning it.

…lasses for versioned-cell and versioned-site
…sponding changes to AdminSpatialUnit and GridSpatialUnit
@jc-harrison jc-harrison added FlowMachine Issues related to FlowMachine refactoring labels Jun 12, 2019

Parameters
----------
criterion : str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it an enum?

@greenape
Copy link
Member

Man, and I thought I had a tendency towards giant PRs ;)

I think the overall thrust of this is good (and it is a marathon effort), more once I've looked in detail.

from marshmallow.validate import OneOf


class AggregationUnit(String):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Maybe not now, but) this is a prime target for shifting to a JSON object which maps to the spatial objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I haven't attempted to implement this shift, partly because I've attempted to avoid changing any behaviour in this PR and partly because I think this change would be best as part of a wider consideration of how we want to expose aggregation units (and the associated permissions). But anticipation of this change was one of the reasons for moving this definition out of custom_fields.py.

@greenape
Copy link
Member

With the GeoDataMixin change - can we just straightforwardly mandate, e.g. abstract method or similar, that the spatial_unit needs to exist?

@greenape
Copy link
Member

Ok, I think what I'd like to do is get the conflicts resolved, then merge this and open a couple of issues for any post-hoc cleanup.

@codecov
Copy link

codecov bot commented Jun 27, 2019

Codecov Report

Merging #928 into master will increase coverage by 0.49%.
The diff coverage is 93.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #928      +/-   ##
==========================================
+ Coverage   93.38%   93.88%   +0.49%     
==========================================
  Files         138      140       +2     
  Lines        6911     6902       -9     
  Branches      699      693       -6     
==========================================
+ Hits         6454     6480      +26     
+ Misses        332      311      -21     
+ Partials      125      111      -14
Flag Coverage Δ
#flowapi_unit_tests 77.27% <ø> (ø) ⬆️
#flowauth_unit_tests 94.86% <ø> (ø) ⬆️
#flowclient_unit_tests 81.99% <ø> (ø) ⬆️
#flowetl_unit_tests 100% <ø> (ø) ⬆️
#flowkit_jwt_generator_unit_tests 100% <ø> (ø) ⬆️
#flowmachine_unit_tests 91.48% <93.65%> (+0.5%) ⬆️
#integration_tests 61.54% <60.84%> (+0.89%) ⬆️
Impacted Files Coverage Δ
...e/server/query_schemas/joined_spatial_aggregate.py 100% <ø> (ø) ⬆️
...ne/core/server/query_schemas/radius_of_gyration.py 100% <ø> (ø) ⬆️
...owmachine/core/server/query_schemas/dummy_query.py 100% <ø> (ø) ⬆️
...hine/flowmachine/features/spatial/location_area.py 83.18% <ø> (ø) ⬆️
...lowmachine/flowmachine/features/spatial/circles.py 100% <ø> (ø) ⬆️
...ine/core/server/query_schemas/spatial_aggregate.py 100% <ø> (ø) ⬆️
...machine/core/server/query_schemas/custom_fields.py 75.6% <ø> (-2.17%) ⬇️
...owmachine/features/subscriber/label_event_score.py 100% <ø> (ø) ⬆️
...ine/flowmachine/features/subscriber/metaclasses.py 100% <ø> (ø) ⬆️
flowapi/flowapi/geography.py 65.38% <0%> (-11.89%) ⬇️
... and 58 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c5fda2...d6bd9cf. Read the comment docs.

@jc-harrison
Copy link
Member Author

With the GeoDataMixin change - can we just straightforwardly mandate, e.g. abstract method or similar, that the spatial_unit needs to exist?

Hmm. The spatial_unit attribute is only needed in _geo_augmented_query, and there are several classes (e.g. Grid) which redefine this method, so self.spatial_unit is then not required. We could access spatial_unit in _geo_augmented_query through an abstract method (e.g. _get_spatial_unit), but then classes like Grid would also need to define this method but never use it.

I suppose we could make _geo_augmented_query an abstract method, and derive a new class (something like GeoDataUsingSpatialUnitMixin, but with a better name) which implements _geo_augmented_query and adds a new abstract method _get_spatial_unit?

Copy link
Member

@greenape greenape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I'm happy to merge here and do any cleanup work post-hoc.

@greenape greenape added the ready-to-merge Label indicating a PR is OK to automerge label Jul 1, 2019
@mergify mergify bot merged commit 6bcbe11 into master Jul 1, 2019
@mergify mergify bot deleted the spatial-unit branch July 1, 2019 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FlowMachine Issues related to FlowMachine ready-to-merge Label indicating a PR is OK to automerge refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor spatial levels/aggregation
2 participants