Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KDVH migration package #30

Merged
merged 36 commits into from
Nov 20, 2024
Merged

Add KDVH migration package #30

merged 36 commits into from
Nov 20, 2024

Conversation

Lun4m
Copy link
Collaborator

@Lun4m Lun4m commented Sep 25, 2024

This PR tackles the first part of issue #14.
I adapted some scripts originally written by Ketil for ODA to dump tables from KDVH and import them into LARD. Almost happy with the code...

Things I'm still not 100% sure about:

  1. I'm using the KDVH proxy to dump the tables, but I couldn't find T_MONTH_INTERPOLATED and T_DIURNAL_INTERPOLATED, probably I need to connect to KDVH directly?
  2. Is blobData used for non-scalar parameters (KLOBS for example)?
  3. Do we need to dump observations that have both NULL data and flags? Related to ->
  4. in data_functions.go we do a lot of postprocessing that I haven't touched since I don't really understand it. My main concern is with the insertion of "null" values (i.e. -32767, etc.), are these used somewhere downstream? Otherwise I might simply drop them, since our schema can handle real NULLs.

@ketilt, do you have any insight regarding these points? Or comments about the code, if you have time to take a look at it?

Still left to do:

  • remove CorrKDVH field, since original and corrected have the same value
  • Double check that the derived controlinfo and useinfo are correct

@ketilt
Copy link
Collaborator

ketilt commented Sep 30, 2024

  1. It makes sense that they are not available in the proxy. Do not worry about dumping them. The tables are not in active use by anything. I can make a dump of them.
  2. Yes
  3. Related to what? There should be no reason to dump nulls. They likely exist only because of the table format of KDVH (empty cells in a row where something is not null)
  4. This is a kvalobs feature. The missing indicator exists in two versions: -32767 means expected value missing and -32766 means value was found but was censored in QC. These numbers will not feature in the KDVH data, but they are all over the queues and all over ODA. If you've found a lossless way to handle this differently, it's probably better to do that instead

@ketilt
Copy link
Collaborator

ketilt commented Sep 30, 2024

Flag definitions:
https://dokit.met.no/klima/bjornn/kvalobs/flagg_i_kvalobs_versjon_8.7.1

@Lun4m
Copy link
Collaborator Author

Lun4m commented Sep 30, 2024

  1. Related to what? There should be no reason to dump nulls.
    They likely exist only because of the table format of KDVH (empty cells in a row where something is not null)

Related to point 4, sorry for the confusion! I was specifically asking since in makeDataPage it's fine if both are null:

if kdvh.flagsAreInvalid() {
useinfo = []byte("9999900900000000")
} else {
useinfo = []byte(kdvh.Flags + "00900000000")
}
if !nullData {
controlinfo = []byte("0000000000000000")
} else {
controlinfo = []byte("0000003000000000")
floatval = -32767
}

But I'm with you, it doesn't make sense to dump them 👌

@ketilt
Copy link
Collaborator

ketilt commented Sep 30, 2024

  1. Related to what? There should be no reason to dump nulls.
    They likely exist only because of the table format of KDVH (empty cells in a row where something is not null)

Related to point 4, sorry for the confusion! I was specifically asking since in makeDataPage it's fine if both are null:

if kdvh.flagsAreInvalid() {
useinfo = []byte("9999900900000000")
} else {
useinfo = []byte(kdvh.Flags + "00900000000")
}
if !nullData {
controlinfo = []byte("0000000000000000")
} else {
controlinfo = []byte("0000003000000000")
floatval = -32767
}

But I'm with you, it doesn't make sense to dump them 👌

Hmm, could this be in the case of blob data?

@Lun4m Lun4m force-pushed the kdvh_importer branch 4 times, most recently from cc7a951 to 40b69e6 Compare November 11, 2024 08:38
@Lun4m Lun4m marked this pull request as ready for review November 11, 2024 08:51
@Lun4m Lun4m requested review from intarga and ketilt November 11, 2024 08:51
db/flags.sql Show resolved Hide resolved
migrations/kdvh/dump.go Outdated Show resolved Hide resolved
migrations/kdvh/import.go Outdated Show resolved Hide resolved
migrations/kdvh/import_functions.go Outdated Show resolved Hide resolved
migrations/kdvh/import_functions.go Outdated Show resolved Hide resolved
migrations/kdvh/main.go Outdated Show resolved Hide resolved
migrations/kdvh/main.go Outdated Show resolved Hide resolved
migrations/kdvh/main.go Outdated Show resolved Hide resolved
migrations/kdvh/main.go Outdated Show resolved Hide resolved
migrations/kdvh/table.go Outdated Show resolved Hide resolved
migrations/kdvh/table.go Outdated Show resolved Hide resolved
justfile Outdated Show resolved Hide resolved
migrations/README.md Outdated Show resolved Hide resolved
migrations/kdvh/main.go Outdated Show resolved Hide resolved
migrations/kdvh/dump.go Outdated Show resolved Hide resolved
migrations/kdvh/dump.go Outdated Show resolved Hide resolved
migrations/kdvh/import.go Outdated Show resolved Hide resolved
migrations/kdvh/table.go Outdated Show resolved Hide resolved
migrations/lard/import.go Outdated Show resolved Hide resolved
migrations/kdvh/product_offsets.csv Outdated Show resolved Hide resolved
migrations/kdvh_test.go Outdated Show resolved Hide resolved
@intarga
Copy link
Member

intarga commented Nov 13, 2024

@Lun4m Do you think there would be a point in removing the indices while importing?

I would think it's maybe redundant with COPY, but this doc page suggests it: https://www.postgresql.org/docs/current/populate.html#POPULATE-RM-INDEXES

In any case we should probably run ANALYZE after importing

@ketilt
Copy link
Collaborator

ketilt commented Nov 14, 2024

@Lun4m Are you filtering open and closed data? I don't remember seeing anything about that when reading the code, but perhaps I missed it

I can provide a list of which stnr / tables are defined as closed

@intarga
Copy link
Member

intarga commented Nov 14, 2024

I can provide a list of which stnr / tables are defined as closed

Thanks, but I don't think that's necessary, we can fetch it from stinfosys

@Lun4m Lun4m requested a review from intarga November 18, 2024 10:46
@Lun4m
Copy link
Collaborator Author

Lun4m commented Nov 18, 2024

Ah, I forgot to drop the indices. Edit: Done!

migrations/kdvh/import/main.go Outdated Show resolved Hide resolved
migrations/kdvh/import/main.go Show resolved Hide resolved
db/flags.sql Outdated Show resolved Hide resolved
migrations/utils/email.go Outdated Show resolved Hide resolved
@Lun4m Lun4m merged commit 740d0a6 into trunk Nov 20, 2024
1 check passed
@Lun4m Lun4m deleted the kdvh_importer branch November 20, 2024 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants