Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process of running an AMENDMENT #196

Open
ArthurChapman opened this issue Mar 22, 2022 · 4 comments
Open

Process of running an AMENDMENT #196

ArthurChapman opened this issue Mar 22, 2022 · 4 comments

Comments

@ArthurChapman
Copy link
Collaborator

I am creating a new Issue here rather than lose the discussion under other issues.

@chicoreus under Issue #112 in discussing descriptions etc. said: " however we will want to be careful that we use language that asserts that a change is proposed rather than language that asserts that data has been changed:" I AGREE!

However, that raises an interesting issue. In practice if we use Validation-Amendment-Validation, then if data isn't altered/changed, how can that work? What is the process. Does the AMENDMENT change the value in, for example dwc:taxonRank, and create a dwc:taxonRankOld or equivalent so that when you run the post AMENDMENT validation, it is working on the new dwc:taxonRank or ....?

@chicoreus
Copy link
Collaborator

The tests are intended to be agnostic about how they are composed. Running validations and measures in a pre-amendment phase, then running amendments then running the validations and measures again, except on the data with the amendments applied in a post-amendment phase is one (effective) way of composing the tests. When run in this way, the effect of accepting the amendments on the difference in data quality can be measured by comparing the pre-amendment and post-amendment results.

It would probably not be a good idea to blindly apply the proposed amendments directly to a database of record. When run for quality control purposes, an examination of the proposed amendments and the non-compliant validations are likely to suggest multiple focused data cleanup projects to the data curator, some of which might involve directly applying the proposed changes to the data suggested by the amendments to the database of record.

In a quality assurance setting, a researcher could consider using the input data excluding all records for which any core test was not compliant, and the researcher could also consider using the amended data, likely giving a larger pool of records fit for their purpose (with the comparison between pre- and post- amendment validations and measures giving a clear measure of how much the quality of the data was improved for core purposes).

Again, the tests are agnostic as to how they are composed. In what we've talked about as upstream uses (that is, upstream of a database of record), amendments could be implemented within data pipelines, in association with data capture or OCR tools or in other places where they might stand alone.

What we probably need to spell out are the assumptions around the tests, rather than processes of running them. The test suite as a whole is intended to be run on Darwin Core occurrence data (flat or structured, with the tests and their composition being agnostic to that, e.g. occurrences with a structured identification history could have multiple independent runs of the taxon validation tests for an occurrence with multiple identifications), where the data are being assessed for their fitness for use for CORE (as identified by TG3) purposes (what taxon occurred where when).

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 23, 2022

The utility of the TG2 work isn't dependent on annotations attached to records, but would be greatly enhanced by them. This applies even more to AMENDMENTs than VALIDATIONs. Annotations provide an effective mechanism of tracking status and changes to a record.

Please note @debpaul :)

@ArthurChapman
Copy link
Collaborator Author

What is the current state of the TDWG Annotations IG? (@debpaul) It was suggested previously (@chicoreus at the Denver SPNHC meeting) that we use the WC3 Annotations. For the TG2 tests and Assertions, we are dependent on the work of the Annotations Group.

@debpaul
Copy link

debpaul commented Mar 23, 2022

I think @chicoreus knows better than I, the Annotations IG status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants