Skip to content

Commit

Permalink
Added a few more important notes for the solution
Browse files Browse the repository at this point in the history
  • Loading branch information
cpcundill committed Oct 16, 2024
1 parent 0209bfd commit 4a62f9b
Showing 1 changed file with 20 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ Until the time of writing (October '24), the need for access to information abou

It is no secret to the team that our use of Datasette stretches beyond the purpose for which it was originally designed. Recent advances with the Submit tool have further proven the difficulty of relying upon Datasette for OLAP style queries. We have effectively been using Datasette as an API for data collection pipeline metadata - a task which it doesn't naturally suit. Performance and stability are just two of the problems presented by our attempts to employ Datasette as a drop-in replacement for an API.

Another problem with using Datasette as an API, is the lack of versioning for endpoints. It becomes very brittle and the only method is to create shadow a database.

An internal API, separate from our existing public platform API is proposed to provide access to the data consumed and produced by the data collection pipelines. This metadata includes:

* Logs
Expand All @@ -52,12 +54,29 @@ An internal API, separate from our existing public platform API is proposed to p

* Configuration

#### API Spec

An [example of the potential shape of the API](https://app.swaggerhub.com/apis/CHRISCUNDILL_1/data-collection-pipelines/1.0.0-oas3.1) has been provided in OpenAPI specification format.

#### Public access

> The API will be _publicly_ internal, meaning it will be coded in the open and generally accessible. The demarcation of "internal"
is important since it communicates the intent that API is primarily for satisfying the needs of internal software tools.
> Should it become apparent that certain endpoints have wider-appeal, they could be promoted to the public platform API.
#### Versioning

For versioning of endpoints and the associated request & response schemas, the root of the API path will contain a version, i.e. `/v1`. For example, the path to the logs endpoint would as follows:

* https://pipeline-api.planning.data.gov.uk/v1/logs

Ideally, a maximum of two versions of the same resource would be maintained at any one time, e.g.

* https://pipeline-api.planning.data.gov.uk/v1/logs
* https://pipeline-api.planning.data.gov.uk/v2/logs

Importantly, a deprecation date should be agreed for the older version, and all known API consumers should be notified of such.

### Container diagram

The following container diagram illustrates how the Pipeline API will be able to communicate across a number of different data sources and formats to provide a single view of pipeline metadata.
Expand All @@ -72,6 +91,7 @@ The following container diagram illustrates how the Pipeline API will be able to
* Service should be able to migrate/mange own database schemas
* No need for CloudFront distribution not absolutely necessary
* Public load balancer will be required
* Use version number at root of API paths, e.g. v1

* New AWS resources will need to be provisioned for:

Expand Down

0 comments on commit 4a62f9b

Please sign in to comment.