Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update timestamp check to identify all duplicates, not just those where data are discrepant #182

Open
rjhd2 opened this issue Nov 7, 2024 · 1 comment · May be fixed by #184
Open

Update timestamp check to identify all duplicates, not just those where data are discrepant #182

rjhd2 opened this issue Nov 7, 2024 · 1 comment · May be fixed by #184
Assignees
Labels
enhancement New feature or request

Comments

@rjhd2
Copy link
Collaborator

rjhd2 commented Nov 7, 2024

Timestamp check currently only sets flags if the two observation values are different, and hence there's a conflict as to what the right answer was.

Merge in Rel6.1 resulted in some cases of duplicated timestamps, where the values were the same, but the metadata were different (e.g. licence policy). Although these should be fixed in Rel8, set a flag which indicates that these have been noted, and retain the most open data version of that entry.

Image from Paul Poli:
image

Details from Simon Noone:

The same station has been mingled into two different primary stations. See table below as example, the first station AAI0000TNCA is in Aruba and has a data policy (1) the second station USW00000411 identified as being in the US and hence (0) data policy. The cdm conversion code uses the secondary station ID (merged station id+source id = primary_station_id_2) which should be unique to join the data policy information to the cdm_lite file from the record_ID look up .csv. The record_ID.csv is created manually from the mingle list.

I think the code is seeing the two primary_station_id_2 entries and copying over two sets of all observations into the output cdm-lite file and allocated the two different data policy flags and record_numbers. I would say that station USW00000411 has the same issue.

@rjhd2 rjhd2 added the enhancement New feature or request label Nov 7, 2024
@rjhd2 rjhd2 self-assigned this Nov 7, 2024
@rjhd2
Copy link
Collaborator Author

rjhd2 commented Nov 12, 2024

Turns out this was an error in how the CDM conversion takes the QCd files, with the record_ID and secondary station merging as part of that process.

Still a useful check given that otherwise the timestamp check does not highlight duplicated, but identical observations. To merge, just in case it's useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant