Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should the service handle datasets with multiple endpoints and resources? #619

Open
CharliePatterson opened this issue Nov 5, 2024 · 5 comments
Assignees

Comments

@CharliePatterson
Copy link
Contributor

CharliePatterson commented Nov 5, 2024

Context:

The problem:

  • LPAs can submit multiple endpoints for a single dataset, and each endpoint can contain multiple resources.
  • This represents a problem for how the service works, as we have to consider which endpoints and resources to use when playing data and data quality issues back to LPAs.

Scenarios:

  • The majority of the instances where an LPA has multiple active endpoints for a single dataset are for the brownfield land and developer agreement datasets.
  • Each new instance of an endpoint for these datasets can be considered the latest version of the dataset provided by the LPA: i.e. each one should contain the same entities, but with (potentially) updated data.
  • However, there's a very small number of edge cases (3, currently):
    • The data previously came from an LPA that was previously multiple authorities, but have subsequently been merged together into a single LPA (e.g. Buckinghamshire: Article 4 direction areas & TPO).
    • They've split the dataset into different layers - e.g. Doncaster's Article 4 direction areas.
  • In these 3 instances the data provided in each endpoint is unique, containing distinct / separate entities in each endpoint.

Potential implementation:

What are the different ways the service might handle this?

Dashboard:

TODO: how do we handle statuses on the dashboard page? e.g. if one endpoint is broken, showing issue count, etc.

Dataset details page:

This page lists all endpoints submitted by an LPA for a specific dataset. There are two options:

  1. Show only active endpoints: List only endpoints that are currently active (i.e., without an end date).
  2. Show all endpoints, categorised: Display all endpoints, with separate sections for active and inactive/ended ones.

Tasklist:

This page displays a list of tasks for the LPA to address in order to improve data quality. These tasks are derived from the issues table. Here are the possible approaches:

  1. Latest active endpoint only: Show issues from the latest resource of the most recent active endpoint.
  2. All active endpoints, separated by endpoint: List issues from the latest resource of all active endpoints, clearly indicating the source endpoint for each issue.

Considerations:

  1. Single endpoint per dataset: Ideally, each LPA would provide a single, unique endpoint for each dataset, or use new endpoints as updated versions of previously submitted data, and any historical endpoints that contain the same set of entities would be considered not active.
  2. Latest endpoint when new one counts as an update: When an endpoint represents an update to a previously submitted file, we should prompt the LPA only to address issues within the latest active endpoint.
  3. Differentiate between updates and additional data: The service needs a mechanism to distinguish between instances where an endpoint represents a unique set of entities in addition to those already submitted, and updates to existing/previously submitted data.
  4. Making endpoints inactive: Should we be making these endpoints inactive? i.e. those that represent an update to previously provided data.
  5. Distinct entities: For cases where endpoints contain distinct entities (rather than updated versions), the system should display issues from all endpoints but only from the latest resource in each.
  6. Always show from latest resource: Data from the latest resource should always be prioritised for display.

Other questions:

  • The pipeline can assemble facts for each entity based on a number of different resources or endpoints. How does this impact what we show LPAs?
@CharliePatterson
Copy link
Contributor Author

From the discussion yesterday:

  • We should only show active endpoints, and the latest resource for each endpoint.
  • For any endpoints that are essentially an update previously submitted data - primarily BfL & developer agreements (i.e. they share the same entities between them), we should use the latest fact / resource.
  • Where a dataset is split across multiple endpoints / each one has unique entities, we should show each endpoint on the tasklist.
  • There's an open question on how we handle deleted entities / records, which requires a separate conversation.
  • We should look at updating the guidance:
    • Potentially to introduce a section around endpoint management, encouraging people to use a single endpoint rather than multiple.
    • Clarifying the desired approach around deleted records.
  • Potential research around this:
    • How do LPAs currently manage BfL data? Can they follow the same pattern that they're using for other datasets? The guidance says to use a persistent URL, are they?

@CharliePatterson
Copy link
Contributor Author

CharliePatterson commented Nov 8, 2024

Thinking about the work that's required off the back of this:

Design:

Dataset details page:

  • How do we show multiple endpoints?

Tasklist page:

  • How do we show issues from multiple endpoints?

Dashboard page:

  • How do we handle instances where one endpoint is broken and the others are live?

Engineering:

Dataset details page:

  • Only show active endpoints.
  • Update UI based on design outputs.

Tasklist page:

  • Show issues linked to any and all entities (latest facts) and group them by endpoint.

Dashboard page:

  • Ensure issue count is correct.
  • Ensure the way we handle broken endpoints is correct.

cc/ @GeorgeGoodall-GovUk @maddie-broxup @alextea - is there anything else we need to consider about the implementation here?

@CharliePatterson
Copy link
Contributor Author

@GeorgeGoodall-GovUk has already started testing out some approaches here: #600

@alextea
Copy link
Contributor

alextea commented Nov 14, 2024

First round of designs

Image
Image
Image
Image
Image
Image

@alextea
Copy link
Contributor

alextea commented Nov 14, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Design, analysis, research
Development

No branches or pull requests

3 participants