Skip to content
This repository has been archived by the owner on Jun 19, 2023. It is now read-only.

Enforcing schemas to improve data quality #416

Open
rossjones opened this issue May 6, 2016 · 0 comments
Open

Enforcing schemas to improve data quality #416

rossjones opened this issue May 6, 2016 · 0 comments

Comments

@rossjones
Copy link
Contributor

rossjones commented May 6, 2016

Data quality is an issue, asking nicely doesn’t improve the quality of the data that is published, claims that it meets a schema are not checked or enforced, and we’re collecting more of it all the time. So I’d like to suggest, for a specific subset of the datasets we hold we try and enforce a way of making sure only quality data arrives on data.gov.uk - my hope is that if it is successful for spend-data, we might try the same approach with other data as well.

Currently, for core-departments, there is HMT guidance for spend data ( https://www.gov.uk/government/publications/guidance-for-publishing-spend-over-25000 ) which isn’t being followed. It is technically feasible for DGU, assuming we know it is spend data to attempt to validate the data with this schema.

So I’d like to propose:

  1. We determine early whether the publisher is publishing spend-data.
  2. For core-departments we disallow publishing spend-data if it does not validate with the schema.
  3. For non-core depts/organisations we show a warning close to the uploaded resource, and every time it is viewed, explaining that it does not validate.
  4. We consider whether it is feasible to retro-actively ‘clean’ existing resources to be closer to the schema.

Happy to discuss between now and next sprint planning, but I think discussing it in person would also be good.

There is also useful discussion about Spend Schema at co-cddo/open-standards#21

@pigspamster pigspamster changed the title Enforcing schemas to improve data quality Enforcing schemas to improve data quality May 17, 2016
@davidread davidread removed the ready label Jul 18, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants