How should the service handle datasets with multiple endpoints and resources? #619

CharliePatterson · 2024-11-05T10:26:09Z

Context:

The problem:

LPAs can submit multiple endpoints for a single dataset, and each endpoint can contain multiple resources.
This represents a problem for how the service works, as we have to consider which endpoints and resources to use when playing data and data quality issues back to LPAs.

Scenarios:

The majority of the instances where an LPA has multiple active endpoints for a single dataset are for the brownfield land and developer agreement datasets.
Each new instance of an endpoint for these datasets can be considered the latest version of the dataset provided by the LPA: i.e. each one should contain the same entities, but with (potentially) updated data.
However, there's a very small number of edge cases (3, currently):
- The data previously came from an LPA that was previously multiple authorities, but have subsequently been merged together into a single LPA (e.g. Buckinghamshire: Article 4 direction areas & TPO).
- They've split the dataset into different layers - e.g. Doncaster's Article 4 direction areas.
In these 3 instances the data provided in each endpoint is unique, containing distinct / separate entities in each endpoint.

Potential implementation:

What are the different ways the service might handle this?

Dashboard:

TODO: how do we handle statuses on the dashboard page? e.g. if one endpoint is broken, showing issue count, etc.

Dataset details page:

This page lists all endpoints submitted by an LPA for a specific dataset. There are two options:

Show only active endpoints: List only endpoints that are currently active (i.e., without an end date).
Show all endpoints, categorised: Display all endpoints, with separate sections for active and inactive/ended ones.

Tasklist:

This page displays a list of tasks for the LPA to address in order to improve data quality. These tasks are derived from the issues table. Here are the possible approaches:

Latest active endpoint only: Show issues from the latest resource of the most recent active endpoint.
All active endpoints, separated by endpoint: List issues from the latest resource of all active endpoints, clearly indicating the source endpoint for each issue.

Considerations:

Single endpoint per dataset: Ideally, each LPA would provide a single, unique endpoint for each dataset, or use new endpoints as updated versions of previously submitted data, and any historical endpoints that contain the same set of entities would be considered not active.
Latest endpoint when new one counts as an update: When an endpoint represents an update to a previously submitted file, we should prompt the LPA only to address issues within the latest active endpoint.
Differentiate between updates and additional data: The service needs a mechanism to distinguish between instances where an endpoint represents a unique set of entities in addition to those already submitted, and updates to existing/previously submitted data.
Making endpoints inactive: Should we be making these endpoints inactive? i.e. those that represent an update to previously provided data.
Distinct entities: For cases where endpoints contain distinct entities (rather than updated versions), the system should display issues from all endpoints but only from the latest resource in each.
Always show from latest resource: Data from the latest resource should always be prioritised for display.

Other questions:

The pipeline can assemble facts for each entity based on a number of different resources or endpoints. How does this impact what we show LPAs?

CharliePatterson · 2024-11-08T12:05:14Z

From the discussion yesterday:

We should only show active endpoints, and the latest resource for each endpoint.
For any endpoints that are essentially an update previously submitted data - primarily BfL & developer agreements (i.e. they share the same entities between them), we should use the latest fact / resource.
Where a dataset is split across multiple endpoints / each one has unique entities, we should show each endpoint on the tasklist.
There's an open question on how we handle deleted entities / records, which requires a separate conversation.
We should look at updating the guidance:
- Potentially to introduce a section around endpoint management, encouraging people to use a single endpoint rather than multiple.
- Clarifying the desired approach around deleted records.
Potential research around this:
- How do LPAs currently manage BfL data? Can they follow the same pattern that they're using for other datasets? The guidance says to use a persistent URL, are they?

CharliePatterson · 2024-11-08T12:28:26Z

Thinking about the work that's required off the back of this:

Design:

Dataset details page:

How do we show multiple endpoints?

Tasklist page:

How do we show issues from multiple endpoints?

Dashboard page:

How do we handle instances where one endpoint is broken and the others are live?

Engineering:

Dataset details page:

Only show active endpoints.
Update UI based on design outputs.

Tasklist page:

Show issues linked to any and all entities (latest facts) and group them by endpoint.

Dashboard page:

Ensure issue count is correct.
Ensure the way we handle broken endpoints is correct.

cc/ @GeorgeGoodall-GovUk @maddie-broxup @alextea - is there anything else we need to consider about the implementation here?

CharliePatterson · 2024-11-08T16:44:47Z

@GeorgeGoodall-GovUk has already started testing out some approaches here: #600

alextea · 2024-11-14T16:22:40Z

First round of designs

alextea · 2024-11-14T16:23:05Z

Design Crit mural: https://app.mural.co/t/mhclg2837/m/mhclg2837/1729841459350/3c903499841bd18b161b2a1c8a7c778ac461c85b?wid=38-1731580139904

CharliePatterson added this to Submit and update planning data service Nov 5, 2024

CharliePatterson converted this from a draft issue Nov 5, 2024

CharliePatterson mentioned this issue Nov 5, 2024

Review of the endpoint overview section of the dataset details page #583

Closed

3 tasks

CharliePatterson moved this from Needs refinement to Design, analysis, research in Submit and update planning data service Nov 7, 2024

maddie-broxup mentioned this issue Nov 7, 2024

Explore how the service should handle multiple endpoints #624

Closed

3 tasks

maddie-broxup assigned maddie-broxup and alextea Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should the service handle datasets with multiple endpoints and resources? #619

How should the service handle datasets with multiple endpoints and resources? #619

CharliePatterson commented Nov 5, 2024 •

edited

Loading

CharliePatterson commented Nov 8, 2024

CharliePatterson commented Nov 8, 2024 •

edited

Loading

CharliePatterson commented Nov 8, 2024

alextea commented Nov 14, 2024

alextea commented Nov 14, 2024

How should the service handle datasets with multiple endpoints and resources? #619

How should the service handle datasets with multiple endpoints and resources? #619

Comments

CharliePatterson commented Nov 5, 2024 • edited Loading

Context:

Potential implementation:

Dashboard:

Dataset details page:

Tasklist:

Considerations:

Other questions:

CharliePatterson commented Nov 8, 2024

CharliePatterson commented Nov 8, 2024 • edited Loading

Design:

Engineering:

CharliePatterson commented Nov 8, 2024

alextea commented Nov 14, 2024

alextea commented Nov 14, 2024

CharliePatterson commented Nov 5, 2024 •

edited

Loading

CharliePatterson commented Nov 8, 2024 •

edited

Loading