You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
measurements.to_xarray() is possible, but this will give you a huge dataset and that is a bit inconvenient. ddlpy.simplify_dataframe(measurements).to_xarray() is better, but it will still give you all codes/omschrijvingen separately. For BAALHK 1980-2024 this gives a dataset of >1GB mainly thanks to the WaardeBepalingsmethode.Omschrijving variable.
Also consider combining ds.attrs by merging Code/Omschrijving attrs as f"{Code} ({Omschrijving})". >> no add list of code/omschrijving keys/values to variables
Furthermore, consider to automatically upscale strings containing only numeric values to integers. This would probably reduce WaardeBepalingsmethode.Code significantly. It might be wise to do this already earlier in the process, although this might break some pipelines. >> not generic
consider to always keep some columns as variables (like kwaliteitswaardecode, status, maybe others), since that would make concatenating datasets way easier.
Example code that retrieves VLISSGN data with multiple Waardebepalingsmethode (changed on 07-09-1993):
importddlpylocations=ddlpy.locations()
bool_hoedanigheid=locations['Hoedanigheid.Code'].isin(['NAP'])
bool_stations=locations.index.isin(['VLISSGN'])
bool_grootheid=locations['Grootheid.Code'].isin(['WATHTE'])
bool_groepering=locations['Groepering.Code'].isin(['NVT'])
selected=locations.loc[bool_grootheid&bool_hoedanigheid&bool_groepering&bool_stations]
tstart_dt="1993-08-25 09:47:00"#VLISSGN got new Waardebepalingsmethode in this yeartstop_dt="1994-11-30 09:50:00"measurements=ddlpy.measurements(selected.iloc[0], tstart_dt, tstop_dt)
simple=ddlpy.simplify_dataframe(measurements)
list_cols= ['WaardeBepalingsmethode.Code','WaardeBepalingsmethode.Omschrijving']
measurements[list_cols].drop_duplicates()
time
1990-02-27 01:00:00+01:00 other:F039
1990-03-01 00:00:00+01:00 other:F027
Name: WaardeBepalingsmethode.Code, dtype: object
full [MB]: 1.801
simple [MB]: 0.267
Update 27-3-2023: With the catalog_filter argument added to ddlpy.locations() or the private ddlpy.ddlpy.catalog() function added in #87, retrieving the extended catalog is easy. We can also subset the catalog dataframe directly:
This can be used for adding metadata attrs, but the full set of Waardebepalingsmethode and Parameter are way to large to add:
7 unique Compartiment available in requested subset
51 unique Eenheid available in requested subset
104 unique Grootheid available in requested subset
71 unique Hoedanigheid available in requested subset
34 unique MeetApparaat available in requested subset
895 unique Parameter available in requested subset
6 unique Typering available in requested subset
5 unique Groepering available in requested subset
510 unique WaardeBepalingsmethode available in requested subset
6 unique WaardeBewerkingsmethode available in requested subset
Instead, the unique values from the subsetted extended locations dataframe can be used. This is better than using the unique values from the measurements dataframe, since these will differ per station and time range.
The text was updated successfully, but these errors were encountered:
measurements.to_xarray()
is possible, but this will give you a huge dataset and that is a bit inconvenient.ddlpy.simplify_dataframe(measurements).to_xarray()
is better, but it will still give you all codes/omschrijvingen separately. For BAALHK 1980-2024 this gives a dataset of >1GB mainly thanks to theWaardeBepalingsmethode.Omschrijving
variable.Todo:
WaardeBepalingsmethode.Code
, but showing all available values (Get information about possible values for different metadata keys #28). >> nods.attrs
by merging Code/Omschrijving attrs asf"{Code} ({Omschrijving})"
. >> no add list of code/omschrijving keys/values to variablesWaardeBepalingsmethode.Code
significantly. It might be wise to do this already earlier in the process, although this might break some pipelines. >> not genericExample code that retrieves VLISSGN data with multiple Waardebepalingsmethode (changed on 07-09-1993):
Alternative example (already works):
Prints:
Update 27-3-2023: With the
catalog_filter
argument added toddlpy.locations()
or the privateddlpy.ddlpy.catalog()
function added in #87, retrieving the extended catalog is easy. We can also subset the catalog dataframe directly:This can be used for adding metadata attrs, but the full set of Waardebepalingsmethode and Parameter are way to large to add:
Instead, the unique values from the subsetted extended locations dataframe can be used. This is better than using the unique values from the measurements dataframe, since these will differ per station and time range.
The text was updated successfully, but these errors were encountered: