Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exploration notebooks #13

Merged
merged 2 commits into from
Nov 15, 2024
Merged

Add exploration notebooks #13

merged 2 commits into from
Nov 15, 2024

Conversation

hannahker
Copy link
Collaborator

Adding some work to QA the raster stats outputs by comparing against other calculation methods. See #12 for one of the outcomes of this testing.

@hannahker hannahker requested a review from zackarno October 31, 2024 20:33

```python
# Or replace with other locally saved shapefile
gdf = gpd.read_file("data/eth_adm2_simplified.shp")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put this on blob? or if you want to share data folder we can also do that

import numpy as np

# os.chdir("..")
from src.utils.cloud_utils import get_cog_url
Copy link

@zackarno zackarno Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something weird happening here w/ the methods we are mixing to import local modules... getting the error:

{
	"name": "ImportError",
	"message": "cannot import name 'get_cog_url' from 'src.utils.cloud_utils' (/Users/zackarno/Documents/CHD/repos/hdx-floodscan/src/utils/cloud_utils.py)",
	"stack": "---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[5], line 12
      9 import numpy as np
     11 # os.chdir(\"..\")
---> 12 from src.utils.cloud_utils import get_cog_url
     13 from src.utils.raster_utils import upsample_raster, prep_raster
     15 load_dotenv()

ImportError: cannot import name 'get_cog_url' from 'src.utils.cloud_utils' (/Users/zackarno/Documents/CHD/repos/hdx-floodscan/src/utils/cloud_utils.py)"
}

To fix it I had to get

pip install -e .

working. However, to do this I had to copy over pyproject.toml from my other branch. I guess I have it tracked whereas you maybe don't? However, once it's in the repo I the pip install -e. works and then the error above goes away.

Is it better to gitignore the pyproject.toml or not ? I don't see a good reason we would, but IDK the convention

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry that's on me for not making these fully reproducible! I put these up as some quick-and-dirty notebooks and didn't put a ton of thought into the setup there. The os.chdir("..") is just my hack for making these work since I don't love the overhead of requiring pyproject.toml/setup.cfg files and doing the install.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries! i hear you on the pyproject.toml/setup.cfg overhead point - but I guess good if python team can choose a nice standard way to deal with it. I think if the files were just generated and git tracked there would not really be any overhead...

i believe some of this was handled by the cookiecutter repo (which you helped set up long ago!), however cookiecutter might be outdated now as i think it also gives you a bunch of anticipy configs which we don't necessarily want? Also perhaps the files were added to gitignore instead of tracked? I think gitignoring them is kind of annoying because if someone clones your repo then they don't have them!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree that if we're using that setup then those files should definitely be tracked. It's a great point to think about standard setups as our different kinds of python projects change -- eg. between pipelines and AA projects and dashboards. I remember the cookiecutter was set up in the context of AA projects so there are probably a number of things that don't really apply to pipelines. I think notebooks are a good example of how some of these use cases are different. For example, in AA projects we usually use notebooks as "the" output, so it makes sense we have a good repo structure around that. In this repo, I'm (perhaps a bit lazily..) less concerned with notebook setup since the main pipelines code is the main output here.

Although not a great excuse since it's still important to have others able to run the code -- especially when I ask them to review it! But at least in the former case with AA projects I think it's absolutely important to have a robust setup, whereas here I'm more comfortable with a bit more of a lightweight hack (eg. using os.chdir(..) to manage dependencies).

engine = create_engine(
f"postgresql+psycopg2://chdadmin:{AZURE_DB_PW_PROD}@chd-rasterstats-prod.postgres.database.azure.com/postgres"
)
gdf = gpd.read_file(f"data/{iso3.lower()}/{iso3.lower()}_adm{adm_level}.shp")
Copy link

@zackarno zackarno Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would need this data folder

Copy link

@zackarno zackarno Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also how is this script controlling the working directory? Because for me this path won't work (even with the data) folder because the cwd is in the directory of code running the scripts (exploration in this case) not root directory of project. Therefore in my setup the correct path is "../data/{iso3.lower()}/{iso3.lower()}_adm{adm_level}.shp" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the os.chdir("..") sets the working directory one level higher, so that this path works.


```python
# Or replace with other locally saved shapefile
gdf = gpd.read_file("data/eth_adm2_simplified.shp")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zackarno
Copy link

zackarno commented Nov 7, 2024

I think i figured out why the results are different in the test_grid_alignment script. I think something is going awry in the upsample function. I believe if you correctly (not sure if my example is perfect- but it works) supply a transform argument to .rio.reproject() function the results will be be the same.

I got the results to be the same by adding this to the up_sample function

 bounds = ds.rio.bounds()

    # Calculate the target transform using bounds and new dimensions
    target_transform = rasterio.transform.from_bounds(
        bounds[0],  # left
        bounds[1],  # bottom
        bounds[2],  # right
        bounds[3],  # top
        new_width,
        new_height,
    )

just before this part:

    # Forecast data will have 4 dims, since we have a leadtime
    nd = len(list(ds.dims))

and then later supplying target_transform like:

 ds_ = ds_.rio.reproject(
                ds_.rio.crs,
                shape=(new_height, new_width),
                resampling=Resampling.nearest,
                transform=target_transform,
                nodata=np.nan,
            )

not sure if I implemented 100% correctly since still wrapping my head around python xarray, rioxarray more generally, but either way i think it might be a good lead

@hannahker hannahker merged commit e621481 into main Nov 15, 2024
1 check passed
@hannahker hannahker deleted the exploration-notebooks branch November 15, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants