Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompute Centroid for polygon areas #7418

Merged
merged 2 commits into from
Jan 24, 2025

Conversation

scruti
Copy link
Collaborator

@scruti scruti commented Jan 15, 2025

Trello ticket

https://trello.com/c/cf3f17rU/1528-use-precomputed-centroid-of-locationpolygons-areas

Description

The centroid is used to calculate the distances on the vacancy searches.

The centroid for a polygon area doesn't change unless the area changes too.

As we dynamically calculate it for every location polygon area against every vacancy during the searches, if we pre-compute it we save the DB to execute again the centroid calculation on every search.

This is the first step in pre-computing and storing it in DB.
Further work will follow:

  • Use the pre-computed values on the location searches.
  • Fix possible issues when searching over polygons without area/centroid.
  • Legacy Polygons maintenance.

@scruti scruti force-pushed the precompute-location-polygons-area-centroid branch from ed50a8b to 1fec859 Compare January 16, 2025 10:24
config/brakeman.ignore Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Jan 16, 2025

Review app https://teaching-vacancies-review-pr-7418.test.teacherservices.cloud was successfully deleted

@@ -18,18 +18,27 @@ def call
geometry = feature["geometry"].to_json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to run this by hand? I couldn't find anywhere where we're running this script. I'm also worried about the 99 polygons in current set vs 247 ish in production - it looks like old ones are never removed - although of course some are needed as they are linked to by landing pages

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also 4326 is everywhere (and scary and wrong) - is it worth at least creating a constant and documenting our usage of it as we currently understand it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API name is out of date (for Cities) - it should be Major_Towns_and_Cities_Dec_2015_Boundaries_V2_2022 IMHO

Copy link
Collaborator Author

@scruti scruti Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@starswan good points.

On the when is this running? It is scheduled to run every Sunday with the ImportPolygonDataJob.

Also noticed the current set is way smaller and reached the same conclusion: Old ones are never removed (as the whole script only "find or creates".
I am thinking of adding a rake task to compute the centroid for those with an area and missing centroid, what should cover the legacy existing polygons.

Seems the 4326 gets added automatically by Rails when adding the migration into the schema. Will try to see if I find any option where this can be tweaked: add_column :location_polygons, :centroid, :st_point, geographic: true

I'm going to extend a bit the scope of this ticket and clarify:

  • Need to add a task to compute the legacy location polygons area centroid.
    • What do we want to do with legacy location polygons? (This may need product team input)
  • Over the new 98 polygons or so on the API calls, there are over 20 without area data (what do we do with those? do we want them at all?).
  • Can we tweak the srid in the DB fields migrations? Found that it defines geographical data. We will explore migrating from geographical to geometrical data in a separate ticket.
  • Update the out of date polygon API data sources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird the composite polygons have neither area nor centroid, so I'm not sure how that is going to work...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description: This file contains the digital vector boundaries for the Major Towns and Cities in England and Wales, as at December 2015. Version 2 includes centroid data in the attributes table. [yummy!]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@starswan I've been studying migrating all the endpoints to the ones that provide LAT/LONG data for their centroid.

While this is accurate (compared them with our computed ones) for the Major Towns and Cities endpoint, for counties and regions endpoints the LAT/LONG is rounded up to the closes non-fractional number, turning the provided centroid as heavily inaccurate (multiple counties get the same centroid info, for example).

So I'm going to discard the attempt to pull this info from the APIs and keep computing it each time we update the location polygons, as it happens in the background weekly and turned out to be more precise than the available data on the ONS.

@scruti scruti marked this pull request as draft January 17, 2025 10:31
@scruti scruti force-pushed the precompute-location-polygons-area-centroid branch from 1fec859 to 7d50e00 Compare January 21, 2025 12:37
The centroid is used for calculating the distances on the vacancy
searches.

The centroid for a polygon area doesn't change unless the area changes
too.

As we dinamically calculate it for every location polygon area
against every vacancy during the searches, if we pre-compute it we
save the DB to execute again the centroid calculation on every search.
The task will calculate and populate the centroid of the location
polygons with an area but not stored centroid info.
@scruti scruti force-pushed the precompute-location-polygons-area-centroid branch from 7d50e00 to 2f6b796 Compare January 23, 2025 17:29
@scruti scruti marked this pull request as ready for review January 23, 2025 17:36
@scruti scruti merged commit db378ab into main Jan 24, 2025
13 checks passed
@scruti scruti deleted the precompute-location-polygons-area-centroid branch January 24, 2025 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants