Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: option to load wind and solar resource data from HPC #414

Merged
merged 17 commits into from
Jan 22, 2025

Conversation

elenya-grant
Copy link
Collaborator

@elenya-grant elenya-grant commented Jan 9, 2025

Load wind and solar resource data from h5 datasets

Added the option and functionality to load wind and solar resource data from Wind Toolkit and NSRDB data files (usually hosted on HPC but can be downloaded locally too). This new feature is primarily based on work previously done by @dakotaramos. To use this functionality, the user must input renewable_resource_origin = HPC when creating the SiteInfo object. Did not add new tests because this can only be used if the h5 type datafile are downloaded locally or code is run on an HPC that hosts these datasets.

Also fixed bug in SiteInfo when setting year from input argument rather than from the input data dictionary.

PR Checklist

  • RELEASE.md has been updated to describe the changes made in this PR
  • Documentation
    • Docstrings are up-to-date
    • Related docs/ files are up-to-date, or added when necessary
    • Documentation has been rebuilt successfully
    • [-] Examples have been updated
  • Tests pass (If not, and this is expected, please elaborate in the tests section)
  • PR description thoroughly describes the new feature, bug fix, etc.

Related issues

Impacted areas of the software

  • hopp/simulation/technologies/resource/wind_toolkit_data.py
    • HPCWindData: new Resource type class to load wind resource data from Wind Toolkit data files hosted on HPC
  • hopp/simulation/technologies/resource/nsrdb_data.py
    • HPCSolarData: new Resource type class to load solar resource data from NSRDB data files hosted on HPC
  • hopp/simulation/technologies/resource/__init__.py
    • added imports for HPCSolarData and HPCWindData
  • hopp/simulation/sites/site_info.py
    • added renewable_resource_origin attribute to specify whether to download resource data from API call or load from HPC datasets
    • changed solar_resource and wind_resource to optional inputs to prevent redundant resource file loading
    • added wtk_source_path and nsrdb_source_path as optional inputs to specify WTK or NSRDB data set directories
    • fixed bug that sets resource year to 2012 if 'year' isn't in 'data' dictionary but input to SiteInfo as argument
  • pyproject.toml
    • added NREL-rex to dependencies list

Additional supporting information

Test results, if applicable

@elenya-grant elenya-grant changed the base branch from main to develop January 9, 2025 00:19
@elenya-grant elenya-grant marked this pull request as ready for review January 10, 2025 02:07
@elenya-grant elenya-grant requested review from RHammond2 and removed request for kbrunik January 13, 2025 20:49
nsrdb_source_path (Union[str,Path], optional): directory where NSRDB data is hosted on HPC. Defaults to "".
filepath (str, optional): filepath to NSRDB h5 file on HPC. Defaults to "".
- should be formatted as: /path/to/file/name_of_file.h5
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elenya-grant in the windtoolkit file you have verbiage for Raises: FileNotFoundError: in the docstrings. For consistency I'd add it into this files docstrings as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this!


if filepath == "" and wtk_source_path=="":
# use default filepaths based on resource year
if self.year < 2014:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good, I had a thought when looking at the if, elif statements for the year < or = 2014. Is it worth raising an error if the year is > 2014? I don't believe the normal WindData class has that functionality, but could be a good item to add just in case? no strong feelings either way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a value error if the resource year is outside of valid options for that dataset (also added tests for this)

@dakotaramos
Copy link
Collaborator

This looks good to me, made two comments with possible additions / edits, other than that I think this can be approved and merged.

Copy link
Collaborator

@dakotaramos dakotaramos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to merge from my perspective, all comments from previous review have been addressed.

@johnjasa johnjasa marked this pull request as draft January 16, 2025 20:44
@elenya-grant elenya-grant marked this pull request as ready for review January 17, 2025 00:35
Comment on lines 160 to 190
def test_wtk_resource_filenotfound():
wtk_fake_dir = str(ROOT_DIR)
resource_year = 2012
wtk_fake_fpath = os.path.join(str(ROOT_DIR),f"wtk_conus_{resource_year}.h5")
with pytest.raises(FileNotFoundError) as err:
HPCWindData(lat = 35.201, lon = -101.945, year = resource_year, wind_turbine_hub_ht = 110, wtk_source_path=wtk_fake_dir)
assert str(err.value) == f"Cannot find Wind Toolkit .h5 file, filepath {wtk_fake_fpath} does not exist"

def test_wtk_resource_invalid_year():
wtk_fake_dir = str(ROOT_DIR)
resource_year = 2006
wtk_fake_fpath = os.path.join(str(ROOT_DIR),f"wtk_conus_{resource_year}.h5")
with pytest.raises(ValueError) as err:
HPCWindData(lat = 35.201, lon = -101.945, year = resource_year, wind_turbine_hub_ht = 110, wtk_source_path=wtk_fake_dir)
assert str(err.value) == f"Resource year for WIND Toolkit Data must be between 2007 and 2014 but {resource_year} was provided"

def test_nsrdb_resource_filenotfound():
nsrdb_fake_dir = str(ROOT_DIR)
resource_year = 2012
nsrdb_fake_fpath = os.path.join(str(ROOT_DIR),f"nsrdb_{resource_year}.h5")
with pytest.raises(FileNotFoundError) as err:
HPCSolarData(lat = 35.201, lon = -101.945, year = resource_year, nsrdb_source_path=nsrdb_fake_dir)
assert str(err.value) == f"Cannot find NSRDB .h5 file, filepath {nsrdb_fake_fpath} does not exist"

def test_nsrdb_resource_invalid_year():
nsrdb_fake_dir = str(ROOT_DIR)
resource_year = 2023
nsrdb_fake_fpath = os.path.join(str(ROOT_DIR),f"nsrdb_{resource_year}.h5")
with pytest.raises(ValueError) as err:
HPCSolarData(lat = 35.201, lon = -101.945, year = resource_year, nsrdb_source_path=nsrdb_fake_dir)
assert str(err.value) == f"Resource year for NSRDB Data must be between 1998 and 2022 but {resource_year} was provided"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love these tests, thank you!

Copy link
Collaborator

@RHammond2 RHammond2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my comments are recommendations, including those involving the changes to the initialization routines to allow for more testing. As it stands this looks pretty good to me and could be merged as-is.

Comment on lines +9 to +12
NSRDB_DEP = "/datasets/NSRDB/deprecated_v3/nsrdb_"

# NOTE: Current version of PSM v3.2.2 which corresponds to /api/nsrdb/v2/solar/psm3-2-2-download
NSRDB_NEW = "/datasets/NSRDB/current/nsrdb_"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend converting these to Path objects (Path("file_path")) to make some later manipulations easier.


if filepath == "" and nsrdb_source_path=="":
# use default filepath
self.nsrdb_file = NSRDB_NEW + "{}.h5".format(self.year)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend the use of f-strings in cases like this so it reads a bit easier. f"{self.year}.h5". If you decide to make NSRDB_NEW a path, this becomes NSRDB_NEW / f"{self.year}.h5" as an example

self.nsrdb_file = filepath
elif filepath == "" and nsrdb_source_path != "":
# directory of h5 files (nsrdb_source_path) is provided by user
self.nsrdb_file = os.path.join(str(nsrdb_source_path),"nsrdb_{}.h5".format(self.year))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, for using path this would look like: Path(nsrdb_source_path) / f"nsrdb_{self.year}.h5". I also passed nsrdb_source_path to Path() because it could either be a Path or string, but passing a Path object to Path again won't cause any errors, so it's not worth checking first.

Comment on lines 101 to 104
# extract remaining datapoints: year, month, day, hour, minute, dn, df, gh, wspd,tdry, pres, tdew
# NOTE: datasets have readings at 0 and 30 minutes each hour, HOPP/SAM workflow requires only 30 minute reading values -> filter 0 minute readings with [1::2]
# NOTE: datasets are not auto shifted by timezone offset -> wrap extraction in SAMResource.roll_timeseries(input_array, timezone, #steps in an hour=1) to roll timezones
# NOTE: solar_resource.py code references solar_zenith_angle and RH = relative_humidity but I couldn't find them actually being utilized. Captured them below just in case.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move these extra long comments to multiple lines? I know we don't have any enforcement on this, but it's nice to be able to read comments without sideways scrolling.


# round to desired precision and convert to desired data type
# NOTE: unsure if SAM/PySAM is sensitive to data types and decimal precision.
# If not sensitive, can remove .astype() and round() to increase computational efficiency
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good way to find out? It would be nice to have it settled prior to merging, but if this is pretty small potatoes, I also get it.

import os
from hopp.simulation.technologies.resource.resource import Resource

WTK_V10_BASE = "/datasets/WIND/conus/v1.0.0/wtk_conus_"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about Path: you don't need to but it's recommended to use them in place of strings to make manipulations easier.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not done this yet because the base paths include the first part of the filename as well. I am also not sure whether rex (the tool used to load the resource data from .h5 files) is compatible with Path objects or if it requires strings (I will need to check on this).

self.format_data()


def calculate_heights_to_download(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of a good method to check in the test suite. Basically anything leading up to the download_resources() because we won't have access in the CI pipeline.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test should be added when the ability to not download resource data is added to these classes (to avoid the file not found error)

Comment on lines +84 to +87
self.download_resource()

# Set wind resource data into SAM/PySAM digestible format
self.format_data()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want these in the __init__() method? I would assume it's that it's initialized, then downloaded, or would this pattern be more helpful to the intended workflow? An alternative is to have a flag in the __init__ that triggers downloading in the initialization or not, such as download: bool = True. This would also for passing false and testing paths and other surrounding items for correctness.

Copy link
Collaborator Author

@elenya-grant elenya-grant Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. I have these in the __init__() method because it follows the same workflow as solar_resource.py and wind_resource.py. I think that if we want this change - this may be better suited for PR #415 since that includes the ability to initialize solar_resource.py and wind_resource.py with resource data (and therefore it's not downloaded). Or - I think this could be part of a future PR that includes other development to SiteInfo and the resource functions.

I created an issue (issue #423) to address this

Comment on lines 58 to 81
if filepath == "" and wtk_source_path=="":
# use default filepaths based on resource year
if self.year < 2014 and self.year>=2007:
self.wtk_file = WTK_V10_BASE + "{}.h5".format(self.year)
elif self.year == 2014:
self.wtk_file = WTK_V11_BASE + "{}.h5".format(self.year)
elif filepath != "" and wtk_source_path == "":
# filepath (full h5 filepath) is provided by user
if ".h5" not in filepath:
filepath = filepath + ".h5"
self.wtk_file = filepath
elif filepath == "" and wtk_source_path != "":
# directory of h5 files (wtk_source_path) is provided by user
self.wtk_file = os.path.join(str(wtk_source_path),"wtk_conus_{}.h5".format(self.year))
else:
# use default filepaths
if self.year < 2014:
self.wtk_file = WTK_V10_BASE + "{}.h5".format(self.year)
elif self.year == 2014:
self.wtk_file = WTK_V11_BASE + "{}.h5".format(self.year)

# Check for valid filepath for Wind Toolkit file
if not os.path.isfile(self.wtk_file):
raise FileNotFoundError(f"Cannot find Wind Toolkit .h5 file, filepath {self.wtk_file} does not exist")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the produced file paths are what's expected would also be helpful.

Copy link
Collaborator Author

@elenya-grant elenya-grant Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is partially accomplished in the tests starting with test_wtk_resource_filenotfound_ and test_nsrdb_resource_filenotfound_. This would also be easier to do if (like you suggested) - resource data was not automatically loaded in the __init__() function which will always throw a FileNotFound error when running tests locally

Comment on lines +75 to +82
if not os.path.isfile(self.nsrdb_file):
raise FileNotFoundError(f"Cannot find NSRDB .h5 file, filepath {self.nsrdb_file} does not exist")

# Pull data from HPC NSRDB dataset
self.download_resource()

# Set solar resource data into SAM/PySAM digestible format
self.format_data()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on these that I left in the WTK class.

@elenya-grant elenya-grant merged commit 6839320 into NREL:develop Jan 22, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants