-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exploration notebooks #13
Conversation
|
||
```python | ||
# Or replace with other locally saved shapefile | ||
gdf = gpd.read_file("data/eth_adm2_simplified.shp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put this on blob? or if you want to share data folder we can also do that
import numpy as np | ||
|
||
# os.chdir("..") | ||
from src.utils.cloud_utils import get_cog_url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something weird happening here w/ the methods we are mixing to import local modules... getting the error:
{
"name": "ImportError",
"message": "cannot import name 'get_cog_url' from 'src.utils.cloud_utils' (/Users/zackarno/Documents/CHD/repos/hdx-floodscan/src/utils/cloud_utils.py)",
"stack": "---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[5], line 12
9 import numpy as np
11 # os.chdir(\"..\")
---> 12 from src.utils.cloud_utils import get_cog_url
13 from src.utils.raster_utils import upsample_raster, prep_raster
15 load_dotenv()
ImportError: cannot import name 'get_cog_url' from 'src.utils.cloud_utils' (/Users/zackarno/Documents/CHD/repos/hdx-floodscan/src/utils/cloud_utils.py)"
}
To fix it I had to get
pip install -e .
working. However, to do this I had to copy over pyproject.toml from my other branch. I guess I have it tracked whereas you maybe don't? However, once it's in the repo I the pip install -e.
works and then the error above goes away.
Is it better to gitignore the pyproject.toml
or not ? I don't see a good reason we would, but IDK the convention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry that's on me for not making these fully reproducible! I put these up as some quick-and-dirty notebooks and didn't put a ton of thought into the setup there. The os.chdir("..")
is just my hack for making these work since I don't love the overhead of requiring pyproject.toml
/setup.cfg
files and doing the install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries! i hear you on the pyproject.toml
/setup.cfg
overhead point - but I guess good if python team can choose a nice standard way to deal with it. I think if the files were just generated and git tracked there would not really be any overhead...
i believe some of this was handled by the cookiecutter repo (which you helped set up long ago!), however cookiecutter might be outdated now as i think it also gives you a bunch of anticipy configs which we don't necessarily want? Also perhaps the files were added to gitignore instead of tracked? I think gitignoring them is kind of annoying because if someone clones your repo then they don't have them!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree that if we're using that setup then those files should definitely be tracked. It's a great point to think about standard setups as our different kinds of python projects change -- eg. between pipelines and AA projects and dashboards. I remember the cookiecutter was set up in the context of AA projects so there are probably a number of things that don't really apply to pipelines. I think notebooks are a good example of how some of these use cases are different. For example, in AA projects we usually use notebooks as "the" output, so it makes sense we have a good repo structure around that. In this repo, I'm (perhaps a bit lazily..) less concerned with notebook setup since the main pipelines code is the main output here.
Although not a great excuse since it's still important to have others able to run the code -- especially when I ask them to review it! But at least in the former case with AA projects I think it's absolutely important to have a robust setup, whereas here I'm more comfortable with a bit more of a lightweight hack (eg. using os.chdir(..)
to manage dependencies).
engine = create_engine( | ||
f"postgresql+psycopg2://chdadmin:{AZURE_DB_PW_PROD}@chd-rasterstats-prod.postgres.database.azure.com/postgres" | ||
) | ||
gdf = gpd.read_file(f"data/{iso3.lower()}/{iso3.lower()}_adm{adm_level}.shp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would need this data folder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also how is this script controlling the working directory? Because for me this path won't work (even with the data
) folder because the cwd is in the directory of code running the scripts (exploration
in this case) not root directory of project. Therefore in my setup the correct path is "../data/{iso3.lower()}/{iso3.lower()}_adm{adm_level}.shp"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the os.chdir("..")
sets the working directory one level higher, so that this path works.
|
||
```python | ||
# Or replace with other locally saved shapefile | ||
gdf = gpd.read_file("data/eth_adm2_simplified.shp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need the data folder- also flagged here: https://github.com/OCHA-DAP/ds-raster-stats/pull/13/files#r1829942399
I think i figured out why the results are different in the I got the results to be the same by adding this to the up_sample function
just before this part:
and then later supplying
not sure if I implemented 100% correctly since still wrapping my head around python xarray, rioxarray more generally, but either way i think it might be a good lead |
Adding some work to QA the raster stats outputs by comparing against other calculation methods. See #12 for one of the outcomes of this testing.