Feature/issue 196 Add new feature type to query the API for lake data (…

…#224) * Initial API queries for lake data * Unit tests for lake data * Updates after center point calculations - Removed temp code to calculate a point in API - Implemented unit test to test lake data retrieval - Updated fixtures to load in lake data for testing * Add read lake table permissions to lambda timeseries and track ingest roles * Update documenation to include lake data * Updated documentation to include info on lake centerpoints --------- Co-authored-by: Frank Greguska <[email protected]>
podaac · Aug 22, 2024 · 7d46c4e · 7d46c4e
1 parent d7818e2
commit 7d46c4e
Show file tree

Hide file tree

Showing 11 changed files with 489 additions and 103 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
     - Issue 201 - Create table for tracking granule ingest status
     - Issue 198 - Implement track ingest lambda function CMR and Hydrocron queries
     - Issue 193 - Add new Dynamo table for prior lake data
+    - Issue 196 - Add new feature type to query the API for lake data
 ### Changed
 ### Deprecated 
 ### Removed

diff --git a/docs/examples.md b/docs/examples.md
@@ -270,6 +270,56 @@ Will return GeoJSON:
 }
 ```
 
+## Get time series GeoJSON for a lake
+
+Search for a single lake by ID.
+
+[https://soto.podaac.earthdatacloud.nasa.gov/hydrocron/v1/timeseries?feature=PriorLake&feature_id=6350036102&start_time=2024-07-20T00:00:00Z&end_time=2024-07-26T00:00:00Z&fields=lake_id,time_str,wse,area_total,quality_f,collection_shortname,crid,PLD_version,range_start_time&output=geojson](https://soto.podaac.earthdatacloud.nasa.gov/hydrocron/v1/timeseries?feature=PriorLake&feature_id=6350036102&start_time=2024-07-20T00:00:00Z&end_time=2024-07-26T00:00:00Z&fields=lake_id,time_str,wse,area_total,quality_f,collection_shortname,crid,PLD_version,range_start_time&output=geojson)
+
+Will return GeoJSON:
+
+```json
+{
+    "status": "200 OK",
+    "time": 391.613,
+    "hits": 1,
+    "results": {
+        "csv": "",
+        "geojson": {
+            "type": "FeatureCollection",
+            "features": [
+                {
+                    "id": "0",
+                    "type": "Feature",
+                    "properties": {
+                        "lake_id": "6350036102",
+                        "time_str": "2024-07-25T22:48:23Z",
+                        "wse": "260.802",
+                        "area_total": "0.553409",
+                        "quality_f": "1",
+                        "collection_shortname": "SWOT_L2_HR_LakeSP_2.0",
+                        "crid": "PIC0",
+                        "PLD_version": "105",
+                        "range_start_time": "2024-07-25T22:47:27Z",
+                        "wse_units": "m",
+                        "area_total_units": "km^2"
+                    },
+                    "geometry": {
+                        "type": "Point",
+                        "coordinates": [
+                            -42.590727027987064,
+                            -19.822613018107482
+                        ]
+                    }
+                }
+            ]
+        }
+    }
+}
+```
+
+**NOTE:** Due to the size of the original polygon in the lake (L2_HR_LakeSP) shapefiles, we are only returning the calculated center point of the lake. This is to facilitate conformance with the GeoJSON specification and center points should not be considered accurate.
+
 ## Get time series CSV for river reach
 
 Search for a single river reach by ID.
@@ -310,6 +360,26 @@ Will return CSV:
 }
 ```
 
+## Get time series CSV for lake
+
+Search for a single lake by ID.
+
+[https://soto.podaac.earthdatacloud.nasa.gov/hydrocron/v1/timeseries?feature=PriorLake&feature_id=6350036102&start_time=2024-07-20T00:00:00Z&end_time=2024-07-26T00:00:00Z&fields=lake_id,time_str,wse,area_total,quality_f,collection_shortname,crid,PLD_version,range_start_time&output=csv](https://soto.podaac.earthdatacloud.nasa.gov/hydrocron/v1/timeseries?feature=PriorLake&feature_id=6350036102&start_time=2024-07-20T00:00:00Z&end_time=2024-07-26T00:00:00Z&fields=lake_id,time_str,wse,area_total,quality_f,collection_shortname,crid,PLD_version,range_start_time&output=csv)
+
+Will return CSV:
+
+```json
+{
+    "status": "200 OK",
+    "time": 321.592,
+    "hits": 1,
+    "results": {
+        "csv": "lake_id,time_str,wse,area_total,quality_f,collection_shortname,crid,PLD_version,range_start_time,wse_units,area_total_units\n6350036102,2024-07-25T22:48:23Z,260.802,0.553409,1,SWOT_L2_HR_LakeSP_2.0,PIC0,105,2024-07-25T22:47:27Z,m,km^2\n",
+        "geojson": {}
+    }
+}
+```
+
 ## Accept headers
 
 See the [documentation on the timeseries endpoint](timeseries.md) for an explanation of Accept headers.

diff --git a/docs/intro.md b/docs/intro.md
@@ -11,3 +11,4 @@ Original SWOT data is archived at NASA's [Physical Oceanography Distributed Acti
 Datasets included in Hydrocron:
 
 - [SWOT Level 2 River Single-Pass Vector Data Product, Version 2.0](https://podaac.jpl.nasa.gov/dataset/SWOT_L2_HR_RiverSP_2.0)
+- [SWOT Level 2 Lake Single-Pass Vector Data Product, Version 2.0](https://podaac.jpl.nasa.gov/dataset/SWOT_L2_HR_LakeSP_2.0)
diff --git a/docs/overview.md b/docs/overview.md
@@ -11,12 +11,23 @@ The main timeseries endpoint allows users to search by feature ID.
 River reach and node ID numbers are defined in the [SWOT River Database (SWORD)](https://doi.org/10.1029/2021WR030054),
 and can be browsed using the [SWORD Explorer Interactive Dashboard](https://www.swordexplorer.com/).
 
+Lake ID numbers are defined in the PLD (Prior Lake Database) and can be located in the SWOT shapefiles, see [SWOT Product Description Document for the L2_HR_LakeSP Dataset](https://podaac.jpl.nasa.gov/SWOT?tab=datasets-information&sections=about) for more information on lake identifiers.
+
 SWOT may observe lakes and rivers that do not have an ID in the prior databases. In those cases, hydrology features are added to the Unassigned Lakes data product.
 Hydrocron does not currently support Unassigned rivers and lakes.
 
+Hydrocron currently includes data from these datasets:
+
+- Reach and node shapefiles from the Level 2 KaRIn high rate river single pass vector product (L2_HR_RiverSP)
+- PLD-oriented shapefiles from the Level 2 KaRIn high rate lake single pass vector product (L2_HR_LakeSP)
+
+See this PO.DAAC [page](https://podaac.jpl.nasa.gov/SWOT?tab=datasets-information&sections=about) for more information on SWOT datasets.
+
 ## Limitations
 
-Data return size is limited to 6 MB. If your query response is larger than this a 413 error will be returned.
+Data return size is limited to **6 MB**. If your query response is larger than this a 413 error will be returned.
+
+**For Lake data:** Due to the size of the original polygon in the lake (L2_HR_LakeSP) shapefiles, we are only returning the calculated center point of the lake. This is to facilitate conformance with the GeoJSON specification and center points should not be considered accurate.
 
 ## Citation
 

diff --git a/docs/timeseries.md b/docs/timeseries.md
@@ -85,16 +85,18 @@ Content-Type: text/csv
 
 ### feature : string, required: yes
 
-Type of feature being requested. Either: "Reach" or "Node"
+Type of feature being requested. Either: "Reach", "Node" or "PriorLake"
 
 ### feature_id : string, required: yes
 
 ID of the feature to retrieve
 
 - Reaches have the format CBBBBBRRRRT (e.g., 78340600051)
 - Nodes have the format CBBBBBRRRRNNNT (e.g., 12228200110861)
+- PriorLakes have the format CBBNNNNNNT (e.g., 2710046612)
 
-Please see the [SWOT Product Description Document for the L2_HR_RiverSP Dataset](https://podaac.jpl.nasa.gov/SWOT?tab=datasets-information&sections=about) for more information on identifiers.
+Please see the [SWOT Product Description Document for the L2_HR_RiverSP Dataset](https://podaac.jpl.nasa.gov/SWOT?tab=datasets-information&sections=about) for more information on reach and node identifiers.
+Please see the [SWOT Product Description Document for the L2_HR_LakeSP Dataset](https://podaac.jpl.nasa.gov/SWOT?tab=datasets-information&sections=about) for more information on lake identifiers.
 
 ### start_time : string, required: yes
 
@@ -136,7 +138,7 @@ The SWOT data fields to return in the request.
 
 This is specified in the form of a comma separated list (without any spaces): `fields=reach_id,time_str,wse,slope`
 
-Hydrocron includes additional fields beyond the source data shapefile attributes, including units fields on measurements, cycle and pass information, and SWORD and collection versions. **NOTE: Units are always returned for fields that have corresponding units stored in Hydrocron, they do not need to be requested.** The complete list of input fields that are available through Hydrocron are below:
+Hydrocron includes additional fields beyond the source data shapefile attributes, including units fields on measurements, cycle and pass information, SWORD and PLD (prior river and lake database names), and collection versions. **NOTE: Units are always returned for fields that have corresponding units stored in Hydrocron, they do not need to be requested.** The complete list of input fields that are available through Hydrocron are below:
 
 **Reach data fields**
 
@@ -196,6 +198,21 @@ Hydrocron includes additional fields beyond the source data shapefile attributes
 'crid', 'geometry', 'sword_version', 'collection_shortname'
 ```
 
+**Lake data fields**
+```bash
+'lake_id', 'reach_id', 'obs_id', 'overlap', 'n_overlap',
+'time', 'time_tai', 'time_str', 'wse', 'wse_u', 'wse_r_u', 'wse_std',
+'area_total', 'area_tot_u', 'area_detct', 'area_det_u',
+'layovr_val', 'xtrk_dist', 'ds1_l', 'ds1_l_u', 'ds1_q', 'ds1_q_u',
+'ds2_l', 'ds2_l_u', 'ds2_q', 'ds2_q_u',
+'quality_f', 'dark_frac', 'ice_clim_f', 'ice_dyn_f', 'partial_f',
+'xovr_cal_q', 'geoid_hght', 'solid_tide', 'load_tidef', 'load_tideg', 'pole_tide',
+'dry_trop_c', 'wet_trop_c', 'iono_c', 'xovr_cal_c', 'lake_name', 'p_res_id',
+'p_lon', 'p_lat', 'p_ref_wse', 'p_ref_area', 'p_date_t0', 'p_ds_t0', 'p_storage',
+'cycle_id', 'pass_id', 'continent_id', 'range_start_time', 'range_end_time',
+'crid', 'geometry', 'PLD_version', 'collection_shortname', 'crid'
+```
+
 ## Response Format
 
 ### Default

diff --git a/hydrocron/api/controllers/timeseries.py b/hydrocron/api/controllers/timeseries.py
@@ -129,8 +129,8 @@ def validate_parameters(parameters):
 
     error_message = ''
 
-    if parameters['feature'] not in ('Node', 'Reach'):
-        error_message = f'400: feature parameter should be Reach or Node, not: {parameters["feature"]}'
+    if parameters['feature'] not in ('Node', 'Reach', 'PriorLake'):
+        error_message = f'400: feature parameter should be Reach, Node, or PriorLake, not: {parameters["feature"]}'
 
     elif not parameters['feature_id'].isdigit():
         error_message = f'400: feature_id cannot contain letters: {parameters["feature_id"]}'
@@ -189,6 +189,8 @@ def is_fields_valid(feature, fields):
         columns = constants.REACH_ALL_COLUMNS
     elif feature == 'Node':
         columns = constants.NODE_ALL_COLUMNS
+    elif feature == 'PriorLake':
+        columns = constants.PRIOR_LAKE_ALL_COLUMNS
     else:
         columns = []
     return all(field in columns for field in fields)
@@ -241,6 +243,8 @@ def timeseries_get(feature, feature_id, start_time, end_time, output, fields):
         results = data_repository.get_reach_series_by_feature_id(feature_id, start_time, end_time)
     if feature.lower() == 'node':
         results = data_repository.get_node_series_by_feature_id(feature_id, start_time, end_time)
+    if feature.lower() == 'priorlake':
+        results = data_repository.get_prior_lake_series_by_feature_id(feature_id, start_time, end_time)
 
     if len(results['Items']) == 0:
         data['http_code'] = '400 Bad Request'
@@ -343,15 +347,6 @@ def add_units(gdf, columns):
 def get_response(results, hits, elapsed, return_type, output, compact):
     """Create and return HTTP response based on results.
 
-    :param results: Dictionary of SWOT timeseries results
-    :type results: dict
-    :param hits: Number of results returned from query
-    :type hits: int
-    :param elapsed: Number of seconds it took to query for results
-    :type elapsed: float
-    :param return_type: Accept request header
-    :type return_type: str
-    :param output: Output to return in request
     :param results: Dictionary of SWOT timeseries results
     :type results: dict
     :param hits: Number of results returned from query

diff --git a/hydrocron/api/data_access/db.py b/hydrocron/api/data_access/db.py
@@ -57,6 +57,25 @@ def get_node_series_by_feature_id(self, feature_id, start_time, end_time):  # no
         )
         return items
 
+    def get_prior_lake_series_by_feature_id(self, feature_id, start_time, end_time):  # noqa: E501 # pylint: disable=W0613
+        """
+
+        @param feature_id:
+        @param start_time:
+        @param end_time:
+        @return:
+        """
+        table_name = constants.SWOT_PRIOR_LAKE_TABLE_NAME
+
+        hydrocron_table = self._dynamo_instance.Table(table_name)
+        hydrocron_table.load()
+
+        items = hydrocron_table.query(KeyConditionExpression=(
+            Key(constants.SWOT_PRIOR_LAKE_PARTITION_KEY).eq(feature_id) &
+            Key(constants.SWOT_PRIOR_LAKE_SORT_KEY).between(start_time, end_time))
+        )
+        return items
+
     def get_granule_ur(self, table_name, granule_ur):
         """
 

diff --git a/tests/conftest.py b/tests/conftest.py
@@ -13,43 +13,37 @@
 
 DB_TEST_TABLE_NAME = "hydrocron-swot-test-table"
 
-TEST_SHAPEFILE_PATH = os.path.join(
+TEST_SHAPEFILE_PATH_REACH = os.path.join(
     os.path.dirname(os.path.realpath(__file__)),
     'data',
     'SWOT_L2_HR_RiverSP_Reach_548_011_NA_20230610T193337_20230610T193344_PIA1_01.zip'  # noqa
 )
 
+TEST_SHAPEFILE_PATH_LAKE = os.path.join(
+    os.path.dirname(os.path.realpath(__file__)),
+    'data',
+    'SWOT_L2_HR_LakeSP_Prior_018_100_GR_20240713T111741_20240713T112027_PIC0_01.zip'  # noqa
+)
+
 dynamo_test_proc = factories.dynamodb_proc(
     dynamodb_dir=os.path.join(os.path.dirname(os.path.realpath(__file__)),
                               'dynamodb_local'), port=8000)
 
 dynamo_db_resource = factories.dynamodb("dynamo_test_proc")
 
-
-@pytest.fixture()
-def hydrocron_dynamo_instance(request, dynamo_test_proc):
-    """
-    Set up a connection to a local dynamodb instance and
-    create a table for testing
-    """
-    dynamo_db = boto3.resource(
-        "dynamodb",
-        endpoint_url=f"http://{dynamo_test_proc.host}:{dynamo_test_proc.port}",
-        aws_access_key_id='fakeMyKeyId',
-        aws_secret_access_key='fakeSecretAccessKey',
-        region_name='us-west-2',
-    )
-
+def create_tables(dynamo_db, table_name, feature_id, non_key_atts):
+    """Create DynamoDB tables for testing."""
+
     dynamo_db.create_table(
-        TableName=constants.SWOT_REACH_TABLE_NAME,
+        TableName=table_name,
         AttributeDefinitions=[
-            {'AttributeName': 'reach_id', 'AttributeType': 'S'},
+            {'AttributeName': feature_id, 'AttributeType': 'S'},
             {'AttributeName': 'range_start_time', 'AttributeType': 'S'},
             {'AttributeName': 'granuleUR', 'AttributeType': 'S'}
         ],
         KeySchema=[
             {
-                'AttributeName': 'reach_id',
+                'AttributeName': feature_id,
                 'KeyType': 'HASH'
             },
             {
@@ -77,16 +71,7 @@ def hydrocron_dynamo_instance(request, dynamo_test_proc):
                 ],
                 "Projection": {
                     "ProjectionType": "INCLUDE",
-                    "NonKeyAttributes": [
-                        "reach_id",
-                        "collection_shortname",
-                        "collection_version",
-                        "crid",
-                        "cycle_id",
-                        "pass_id",
-                        "continent_id",
-                        "ingest_time"
-                    ]
+                    "NonKeyAttributes": non_key_atts
                 },
                 "ProvisionedThroughput": {
                     "ReadCapacityUnits": 5,
@@ -96,16 +81,52 @@ def hydrocron_dynamo_instance(request, dynamo_test_proc):
         ]
     )
 
-    hydro_table = HydrocronTable(dynamo_db, constants.SWOT_REACH_TABLE_NAME)
 
+@pytest.fixture()
+def hydrocron_dynamo_instance(request, dynamo_test_proc):
+    """
+    Set up a connection to a local dynamodb instance and
+    create a table for testing
+    """
+    dynamo_db = boto3.resource(
+        "dynamodb",
+        endpoint_url=f"http://{dynamo_test_proc.host}:{dynamo_test_proc.port}",
+        aws_access_key_id='fakeMyKeyId',
+        aws_secret_access_key='fakeSecretAccessKey',
+        region_name='us-west-2',
+    )
+
+    create_tables(
+        dynamo_db, 
+        constants.SWOT_REACH_TABLE_NAME, 
+        'reach_id', 
+        ['reach_id', 'collection_shortname', 'collection_version', 'crid', 'cycle_id', 'pass_id', 'continent_id', 'ingest_time']
+    )
+
+    create_tables(
+        dynamo_db, 
+        constants.SWOT_PRIOR_LAKE_TABLE_NAME, 
+        'lake_id', 
+        ['lake_id', 'collection_shortname', 'collection_version', 'crid', 'cycle_id', 'pass_id', 'continent_id', 'ingest_time']
+    )
+
     # load reach table
+    reach_hydro_table = HydrocronTable(dynamo_db, constants.SWOT_REACH_TABLE_NAME)
     reach_items = swot_shp.read_shapefile(
-        TEST_SHAPEFILE_PATH,
+        TEST_SHAPEFILE_PATH_REACH,
         obscure_data=False,
         columns=constants.REACH_DATA_COLUMNS)
-
     for item_attrs in reach_items:
-        hydro_table.add_data(**item_attrs)
+        reach_hydro_table.add_data(**item_attrs)
+
+    # load lake table
+    lake_hydro_table = HydrocronTable(dynamo_db, constants.SWOT_PRIOR_LAKE_TABLE_NAME)
+    lake_items = swot_shp.read_shapefile(
+        TEST_SHAPEFILE_PATH_LAKE,
+        obscure_data=False,
+        columns=constants.PRIOR_LAKE_DATA_COLUMNS)
+    for item_attrs in lake_items:
+        lake_hydro_table.add_data(**item_attrs)
 
     try:
         request.cls.dynamo_db = dynamo_db
@@ -148,7 +169,7 @@ def hydrocron_dynamo_table(dynamo_db_resource):
     hydro_table = HydrocronTable(dynamo_db_resource, DB_TEST_TABLE_NAME)
 
     items = swot_shp.read_shapefile(
-        TEST_SHAPEFILE_PATH,
+        TEST_SHAPEFILE_PATH_REACH,
         obscure_data=False,
         columns=constants.REACH_DATA_COLUMNS)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -11,3 +11,4 @@ Original SWOT data is archived at NASA's [Physical Oceanography Distributed Acti
		Datasets included in Hydrocron:

		- [SWOT Level 2 River Single-Pass Vector Data Product, Version 2.0](https://podaac.jpl.nasa.gov/dataset/SWOT_L2_HR_RiverSP_2.0)
		- [SWOT Level 2 Lake Single-Pass Vector Data Product, Version 2.0](https://podaac.jpl.nasa.gov/dataset/SWOT_L2_HR_LakeSP_2.0)