-
Notifications
You must be signed in to change notification settings - Fork 0
Data preparation
The following is a guide for scientists and others planning to deliver data to Norwegian Polar Data Centre (NPDC) staff for publishing on the web, in the Npolar API system.
- Create UTF-8 text files
- Format data as CSV or JSON
- Consult the destination API's schema for variable names and format rules
- Bundle properties measured at the same space/time into documents (equivalent to one row of CSV)
The Npolar API system consists of searchable web data stores open for any kind of data originating from the Norwegian Polar Institute's activities.
When data is published in a Npolar API, it is machine readable in multiple formats by any client with web access.
The Npolar API is developed by NPDC staff on top of the powerful Lucene search engine and JSON database CouchDB.
Before publishing data you need to break it down to a simple 2-dimensional (tabular) structure where each data atom (equivalent to one row of CSV) contains:
- latitude
- longitude
- time ("measured")
- one or more properties (preferably scalar values, but vectors or arrays of objects are also possible)
Example of the above mould expressed in three formats for a oceanography profile:
CSV (tab separated):
latitude longitude measured sea_water_salinity sea_water_temperature sea_water_pressure_due_to_sea_water cruise station
77.5 3.0 2000-09-04T16:45:10Z 35.0057 3.0158 65.0 Framstrait-2000 59
JSON (array)
[{ "sea_water_temperature": 3.0158,
"station": "59",
"sea_water_pressure_due_to_sea_water": 65,
"measured": "2000-09-04T16:45:10Z",
"longitude": 3,
"latitude": 77.5,
"sea_water_salinity": 35.0057,
"cruise": "Framstrait-2000"
}]
And lastly: GeoJSON (straight from the Oceanography API).
All of the above documents are compatible with the variable defined in the Oceanography API's JSON schema.
Use any of the above data formats for points, and GeoJSON for other geometries.
By breaking down the data into space-time points of measurements, the same data model applies no matter if the position is fixed or shifting, or it the measured time is fixed or varies (in contrast to e.g. NOAA's approach.
Retrieving data that belongs together is simple, here's the CSV of the entire CTD profile of station 59 in the Framstrait-2000 cruise
Notice the filter-property=value
in the web address above, these makes it easy to link to any combination of data for any property. The permanent address of the entire Framstrait-2000 dataset is then, simply:
http://api.npolar.no/oceanography/?q=&filter-cruise=Framstrait-2000
curl
curl -H "Accept-Encoding: gzip" "http://api.npolar.no/oceanography/?q=&filter-cruise=Framstrait-2000&limit=all&format=csv" > fs2000-oceanography.csv