Skip to content

Commit

Permalink
Feature/issue 196 Add new feature type to query the API for lake data (
Browse files Browse the repository at this point in the history
…#224)

* Initial API queries for lake data

* Unit tests for lake data

* Updates after center point calculations

- Removed temp code to calculate a point in API
- Implemented unit test to test lake data retrieval
- Updated fixtures to load in lake data for testing

* Add read lake table permissions to lambda timeseries and track ingest roles

* Update documenation to include lake data

* Updated documentation to include info on lake centerpoints

---------

Co-authored-by: Frank Greguska <[email protected]>
  • Loading branch information
nikki-t and frankinspace authored Aug 22, 2024
1 parent d7818e2 commit 7d46c4e
Show file tree
Hide file tree
Showing 11 changed files with 489 additions and 103 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Issue 201 - Create table for tracking granule ingest status
- Issue 198 - Implement track ingest lambda function CMR and Hydrocron queries
- Issue 193 - Add new Dynamo table for prior lake data
- Issue 196 - Add new feature type to query the API for lake data
### Changed
### Deprecated
### Removed
Expand Down
70 changes: 70 additions & 0 deletions docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,56 @@ Will return GeoJSON:
}
```

## Get time series GeoJSON for a lake

Search for a single lake by ID.

[https://soto.podaac.earthdatacloud.nasa.gov/hydrocron/v1/timeseries?feature=PriorLake&feature_id=6350036102&start_time=2024-07-20T00:00:00Z&end_time=2024-07-26T00:00:00Z&fields=lake_id,time_str,wse,area_total,quality_f,collection_shortname,crid,PLD_version,range_start_time&output=geojson](https://soto.podaac.earthdatacloud.nasa.gov/hydrocron/v1/timeseries?feature=PriorLake&feature_id=6350036102&start_time=2024-07-20T00:00:00Z&end_time=2024-07-26T00:00:00Z&fields=lake_id,time_str,wse,area_total,quality_f,collection_shortname,crid,PLD_version,range_start_time&output=geojson)

Will return GeoJSON:

```json
{
"status": "200 OK",
"time": 391.613,
"hits": 1,
"results": {
"csv": "",
"geojson": {
"type": "FeatureCollection",
"features": [
{
"id": "0",
"type": "Feature",
"properties": {
"lake_id": "6350036102",
"time_str": "2024-07-25T22:48:23Z",
"wse": "260.802",
"area_total": "0.553409",
"quality_f": "1",
"collection_shortname": "SWOT_L2_HR_LakeSP_2.0",
"crid": "PIC0",
"PLD_version": "105",
"range_start_time": "2024-07-25T22:47:27Z",
"wse_units": "m",
"area_total_units": "km^2"
},
"geometry": {
"type": "Point",
"coordinates": [
-42.590727027987064,
-19.822613018107482
]
}
}
]
}
}
}
```

**NOTE:** Due to the size of the original polygon in the lake (L2_HR_LakeSP) shapefiles, we are only returning the calculated center point of the lake. This is to facilitate conformance with the GeoJSON specification and center points should not be considered accurate.

## Get time series CSV for river reach

Search for a single river reach by ID.
Expand Down Expand Up @@ -310,6 +360,26 @@ Will return CSV:
}
```

## Get time series CSV for lake

Search for a single lake by ID.

[https://soto.podaac.earthdatacloud.nasa.gov/hydrocron/v1/timeseries?feature=PriorLake&feature_id=6350036102&start_time=2024-07-20T00:00:00Z&end_time=2024-07-26T00:00:00Z&fields=lake_id,time_str,wse,area_total,quality_f,collection_shortname,crid,PLD_version,range_start_time&output=csv](https://soto.podaac.earthdatacloud.nasa.gov/hydrocron/v1/timeseries?feature=PriorLake&feature_id=6350036102&start_time=2024-07-20T00:00:00Z&end_time=2024-07-26T00:00:00Z&fields=lake_id,time_str,wse,area_total,quality_f,collection_shortname,crid,PLD_version,range_start_time&output=csv)

Will return CSV:

```json
{
"status": "200 OK",
"time": 321.592,
"hits": 1,
"results": {
"csv": "lake_id,time_str,wse,area_total,quality_f,collection_shortname,crid,PLD_version,range_start_time,wse_units,area_total_units\n6350036102,2024-07-25T22:48:23Z,260.802,0.553409,1,SWOT_L2_HR_LakeSP_2.0,PIC0,105,2024-07-25T22:47:27Z,m,km^2\n",
"geojson": {}
}
}
```

## Accept headers

See the [documentation on the timeseries endpoint](timeseries.md) for an explanation of Accept headers.
Expand Down
1 change: 1 addition & 0 deletions docs/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ Original SWOT data is archived at NASA's [Physical Oceanography Distributed Acti
Datasets included in Hydrocron:

- [SWOT Level 2 River Single-Pass Vector Data Product, Version 2.0](https://podaac.jpl.nasa.gov/dataset/SWOT_L2_HR_RiverSP_2.0)
- [SWOT Level 2 Lake Single-Pass Vector Data Product, Version 2.0](https://podaac.jpl.nasa.gov/dataset/SWOT_L2_HR_LakeSP_2.0)
13 changes: 12 additions & 1 deletion docs/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,23 @@ The main timeseries endpoint allows users to search by feature ID.
River reach and node ID numbers are defined in the [SWOT River Database (SWORD)](https://doi.org/10.1029/2021WR030054),
and can be browsed using the [SWORD Explorer Interactive Dashboard](https://www.swordexplorer.com/).

Lake ID numbers are defined in the PLD (Prior Lake Database) and can be located in the SWOT shapefiles, see [SWOT Product Description Document for the L2_HR_LakeSP Dataset](https://podaac.jpl.nasa.gov/SWOT?tab=datasets-information&sections=about) for more information on lake identifiers.

SWOT may observe lakes and rivers that do not have an ID in the prior databases. In those cases, hydrology features are added to the Unassigned Lakes data product.
Hydrocron does not currently support Unassigned rivers and lakes.

Hydrocron currently includes data from these datasets:

- Reach and node shapefiles from the Level 2 KaRIn high rate river single pass vector product (L2_HR_RiverSP)
- PLD-oriented shapefiles from the Level 2 KaRIn high rate lake single pass vector product (L2_HR_LakeSP)

See this PO.DAAC [page](https://podaac.jpl.nasa.gov/SWOT?tab=datasets-information&sections=about) for more information on SWOT datasets.

## Limitations

Data return size is limited to 6 MB. If your query response is larger than this a 413 error will be returned.
Data return size is limited to **6 MB**. If your query response is larger than this a 413 error will be returned.

**For Lake data:** Due to the size of the original polygon in the lake (L2_HR_LakeSP) shapefiles, we are only returning the calculated center point of the lake. This is to facilitate conformance with the GeoJSON specification and center points should not be considered accurate.

## Citation

Expand Down
23 changes: 20 additions & 3 deletions docs/timeseries.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,16 +85,18 @@ Content-Type: text/csv

### feature : string, required: yes

Type of feature being requested. Either: "Reach" or "Node"
Type of feature being requested. Either: "Reach", "Node" or "PriorLake"

### feature_id : string, required: yes

ID of the feature to retrieve

- Reaches have the format CBBBBBRRRRT (e.g., 78340600051)
- Nodes have the format CBBBBBRRRRNNNT (e.g., 12228200110861)
- PriorLakes have the format CBBNNNNNNT (e.g., 2710046612)

Please see the [SWOT Product Description Document for the L2_HR_RiverSP Dataset](https://podaac.jpl.nasa.gov/SWOT?tab=datasets-information&sections=about) for more information on identifiers.
Please see the [SWOT Product Description Document for the L2_HR_RiverSP Dataset](https://podaac.jpl.nasa.gov/SWOT?tab=datasets-information&sections=about) for more information on reach and node identifiers.
Please see the [SWOT Product Description Document for the L2_HR_LakeSP Dataset](https://podaac.jpl.nasa.gov/SWOT?tab=datasets-information&sections=about) for more information on lake identifiers.

### start_time : string, required: yes

Expand Down Expand Up @@ -136,7 +138,7 @@ The SWOT data fields to return in the request.

This is specified in the form of a comma separated list (without any spaces): `fields=reach_id,time_str,wse,slope`

Hydrocron includes additional fields beyond the source data shapefile attributes, including units fields on measurements, cycle and pass information, and SWORD and collection versions. **NOTE: Units are always returned for fields that have corresponding units stored in Hydrocron, they do not need to be requested.** The complete list of input fields that are available through Hydrocron are below:
Hydrocron includes additional fields beyond the source data shapefile attributes, including units fields on measurements, cycle and pass information, SWORD and PLD (prior river and lake database names), and collection versions. **NOTE: Units are always returned for fields that have corresponding units stored in Hydrocron, they do not need to be requested.** The complete list of input fields that are available through Hydrocron are below:

**Reach data fields**

Expand Down Expand Up @@ -196,6 +198,21 @@ Hydrocron includes additional fields beyond the source data shapefile attributes
'crid', 'geometry', 'sword_version', 'collection_shortname'
```

**Lake data fields**
```bash
'lake_id', 'reach_id', 'obs_id', 'overlap', 'n_overlap',
'time', 'time_tai', 'time_str', 'wse', 'wse_u', 'wse_r_u', 'wse_std',
'area_total', 'area_tot_u', 'area_detct', 'area_det_u',
'layovr_val', 'xtrk_dist', 'ds1_l', 'ds1_l_u', 'ds1_q', 'ds1_q_u',
'ds2_l', 'ds2_l_u', 'ds2_q', 'ds2_q_u',
'quality_f', 'dark_frac', 'ice_clim_f', 'ice_dyn_f', 'partial_f',
'xovr_cal_q', 'geoid_hght', 'solid_tide', 'load_tidef', 'load_tideg', 'pole_tide',
'dry_trop_c', 'wet_trop_c', 'iono_c', 'xovr_cal_c', 'lake_name', 'p_res_id',
'p_lon', 'p_lat', 'p_ref_wse', 'p_ref_area', 'p_date_t0', 'p_ds_t0', 'p_storage',
'cycle_id', 'pass_id', 'continent_id', 'range_start_time', 'range_end_time',
'crid', 'geometry', 'PLD_version', 'collection_shortname', 'crid'
```

## Response Format

### Default
Expand Down
17 changes: 6 additions & 11 deletions hydrocron/api/controllers/timeseries.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,8 +129,8 @@ def validate_parameters(parameters):

error_message = ''

if parameters['feature'] not in ('Node', 'Reach'):
error_message = f'400: feature parameter should be Reach or Node, not: {parameters["feature"]}'
if parameters['feature'] not in ('Node', 'Reach', 'PriorLake'):
error_message = f'400: feature parameter should be Reach, Node, or PriorLake, not: {parameters["feature"]}'

elif not parameters['feature_id'].isdigit():
error_message = f'400: feature_id cannot contain letters: {parameters["feature_id"]}'
Expand Down Expand Up @@ -189,6 +189,8 @@ def is_fields_valid(feature, fields):
columns = constants.REACH_ALL_COLUMNS
elif feature == 'Node':
columns = constants.NODE_ALL_COLUMNS
elif feature == 'PriorLake':
columns = constants.PRIOR_LAKE_ALL_COLUMNS
else:
columns = []
return all(field in columns for field in fields)
Expand Down Expand Up @@ -241,6 +243,8 @@ def timeseries_get(feature, feature_id, start_time, end_time, output, fields):
results = data_repository.get_reach_series_by_feature_id(feature_id, start_time, end_time)
if feature.lower() == 'node':
results = data_repository.get_node_series_by_feature_id(feature_id, start_time, end_time)
if feature.lower() == 'priorlake':
results = data_repository.get_prior_lake_series_by_feature_id(feature_id, start_time, end_time)

if len(results['Items']) == 0:
data['http_code'] = '400 Bad Request'
Expand Down Expand Up @@ -343,15 +347,6 @@ def add_units(gdf, columns):
def get_response(results, hits, elapsed, return_type, output, compact):
"""Create and return HTTP response based on results.
:param results: Dictionary of SWOT timeseries results
:type results: dict
:param hits: Number of results returned from query
:type hits: int
:param elapsed: Number of seconds it took to query for results
:type elapsed: float
:param return_type: Accept request header
:type return_type: str
:param output: Output to return in request
:param results: Dictionary of SWOT timeseries results
:type results: dict
:param hits: Number of results returned from query
Expand Down
19 changes: 19 additions & 0 deletions hydrocron/api/data_access/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,25 @@ def get_node_series_by_feature_id(self, feature_id, start_time, end_time): # no
)
return items

def get_prior_lake_series_by_feature_id(self, feature_id, start_time, end_time): # noqa: E501 # pylint: disable=W0613
"""
@param feature_id:
@param start_time:
@param end_time:
@return:
"""
table_name = constants.SWOT_PRIOR_LAKE_TABLE_NAME

hydrocron_table = self._dynamo_instance.Table(table_name)
hydrocron_table.load()

items = hydrocron_table.query(KeyConditionExpression=(
Key(constants.SWOT_PRIOR_LAKE_PARTITION_KEY).eq(feature_id) &
Key(constants.SWOT_PRIOR_LAKE_SORT_KEY).between(start_time, end_time))
)
return items

def get_granule_ur(self, table_name, granule_ur):
"""
Expand Down
89 changes: 55 additions & 34 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,43 +13,37 @@

DB_TEST_TABLE_NAME = "hydrocron-swot-test-table"

TEST_SHAPEFILE_PATH = os.path.join(
TEST_SHAPEFILE_PATH_REACH = os.path.join(
os.path.dirname(os.path.realpath(__file__)),
'data',
'SWOT_L2_HR_RiverSP_Reach_548_011_NA_20230610T193337_20230610T193344_PIA1_01.zip' # noqa
)

TEST_SHAPEFILE_PATH_LAKE = os.path.join(
os.path.dirname(os.path.realpath(__file__)),
'data',
'SWOT_L2_HR_LakeSP_Prior_018_100_GR_20240713T111741_20240713T112027_PIC0_01.zip' # noqa
)

dynamo_test_proc = factories.dynamodb_proc(
dynamodb_dir=os.path.join(os.path.dirname(os.path.realpath(__file__)),
'dynamodb_local'), port=8000)

dynamo_db_resource = factories.dynamodb("dynamo_test_proc")


@pytest.fixture()
def hydrocron_dynamo_instance(request, dynamo_test_proc):
"""
Set up a connection to a local dynamodb instance and
create a table for testing
"""
dynamo_db = boto3.resource(
"dynamodb",
endpoint_url=f"http://{dynamo_test_proc.host}:{dynamo_test_proc.port}",
aws_access_key_id='fakeMyKeyId',
aws_secret_access_key='fakeSecretAccessKey',
region_name='us-west-2',
)

def create_tables(dynamo_db, table_name, feature_id, non_key_atts):
"""Create DynamoDB tables for testing."""

dynamo_db.create_table(
TableName=constants.SWOT_REACH_TABLE_NAME,
TableName=table_name,
AttributeDefinitions=[
{'AttributeName': 'reach_id', 'AttributeType': 'S'},
{'AttributeName': feature_id, 'AttributeType': 'S'},
{'AttributeName': 'range_start_time', 'AttributeType': 'S'},
{'AttributeName': 'granuleUR', 'AttributeType': 'S'}
],
KeySchema=[
{
'AttributeName': 'reach_id',
'AttributeName': feature_id,
'KeyType': 'HASH'
},
{
Expand Down Expand Up @@ -77,16 +71,7 @@ def hydrocron_dynamo_instance(request, dynamo_test_proc):
],
"Projection": {
"ProjectionType": "INCLUDE",
"NonKeyAttributes": [
"reach_id",
"collection_shortname",
"collection_version",
"crid",
"cycle_id",
"pass_id",
"continent_id",
"ingest_time"
]
"NonKeyAttributes": non_key_atts
},
"ProvisionedThroughput": {
"ReadCapacityUnits": 5,
Expand All @@ -96,16 +81,52 @@ def hydrocron_dynamo_instance(request, dynamo_test_proc):
]
)

hydro_table = HydrocronTable(dynamo_db, constants.SWOT_REACH_TABLE_NAME)

@pytest.fixture()
def hydrocron_dynamo_instance(request, dynamo_test_proc):
"""
Set up a connection to a local dynamodb instance and
create a table for testing
"""
dynamo_db = boto3.resource(
"dynamodb",
endpoint_url=f"http://{dynamo_test_proc.host}:{dynamo_test_proc.port}",
aws_access_key_id='fakeMyKeyId',
aws_secret_access_key='fakeSecretAccessKey',
region_name='us-west-2',
)

create_tables(
dynamo_db,
constants.SWOT_REACH_TABLE_NAME,
'reach_id',
['reach_id', 'collection_shortname', 'collection_version', 'crid', 'cycle_id', 'pass_id', 'continent_id', 'ingest_time']
)

create_tables(
dynamo_db,
constants.SWOT_PRIOR_LAKE_TABLE_NAME,
'lake_id',
['lake_id', 'collection_shortname', 'collection_version', 'crid', 'cycle_id', 'pass_id', 'continent_id', 'ingest_time']
)

# load reach table
reach_hydro_table = HydrocronTable(dynamo_db, constants.SWOT_REACH_TABLE_NAME)
reach_items = swot_shp.read_shapefile(
TEST_SHAPEFILE_PATH,
TEST_SHAPEFILE_PATH_REACH,
obscure_data=False,
columns=constants.REACH_DATA_COLUMNS)

for item_attrs in reach_items:
hydro_table.add_data(**item_attrs)
reach_hydro_table.add_data(**item_attrs)

# load lake table
lake_hydro_table = HydrocronTable(dynamo_db, constants.SWOT_PRIOR_LAKE_TABLE_NAME)
lake_items = swot_shp.read_shapefile(
TEST_SHAPEFILE_PATH_LAKE,
obscure_data=False,
columns=constants.PRIOR_LAKE_DATA_COLUMNS)
for item_attrs in lake_items:
lake_hydro_table.add_data(**item_attrs)

try:
request.cls.dynamo_db = dynamo_db
Expand Down Expand Up @@ -148,7 +169,7 @@ def hydrocron_dynamo_table(dynamo_db_resource):
hydro_table = HydrocronTable(dynamo_db_resource, DB_TEST_TABLE_NAME)

items = swot_shp.read_shapefile(
TEST_SHAPEFILE_PATH,
TEST_SHAPEFILE_PATH_REACH,
obscure_data=False,
columns=constants.REACH_DATA_COLUMNS)

Expand Down
Loading

0 comments on commit 7d46c4e

Please sign in to comment.