Query population usign city boundaries to avoid loading all country data #30

Claudio9701 · 2023-09-20T15:40:06Z

Currrent behaviour:

pop_search = up.download.search_hdx_dataset(country)
pop_country = up.download.get_hdx_dataset(pop_search, pop_index) # This takes too much time

Requested feature:

pop_search = up.download.search_hdx_dataset(country)
pop_country = up.download.get_hdx_dataset(pop_search, pop_index, mask=city_limits) # This is expected to be faster

Where city_limits represent either the city boundaries as a polygon or the city total bounds as a bounding box.

The text was updated successfully, but these errors were encountered:

jeronimoluza · 2024-08-08T16:05:32Z

Hello!
I could take on this one, if no one is doing it.

Question: the mask parameter should always receive a polygon, right?

Claudio9701 · 2024-08-08T22:24:49Z

Hello @jeronimoluza , that's awesome. Thanks for the help. Feel free to make a PR 🚀.

jeronimoluza · 2024-08-21T19:15:39Z

Hi @Claudio9701!
I'm unable to run the current behavior.
The environment that conda env create -f environment.yml tries to build can no longer be created, because Anaconda has discontinued support for python=3.6*.
I was able to install Python 3.6.15 using pyenv, but I was not able to install the required libraries as dependency conflicts seem to exist:

ERROR: Cannot install -r requirements.txt (line 1) because these package versions have conflicting dependencies.

Do you have any advice that can help me build the required workspace?

Claudio9701 · 2024-08-21T19:23:52Z

Hi @jeronimoluza Thanks for your help. Could you try the following steps to setup the workspace:

Project Setup Instructions

Create a project folder
Create a virtual environment inside your folder
```
conda create --name urbanpyEnv
```
Activate the environment
```
conda activate urbanpyEnv
```

Install GeoPandas

(urbanpyEnv) $ conda install geopandas descartes

Install UrbanPy (last dev version)

(urbanpyEnv) $ pip install urbanpy==0.2.2.dev1

Install Docker (Optional)

For Windows users, make sure to run the following command in PowerShell to avoid execution errors:
```
Set-ExecutionPolicy -ExecutionPolicy Unrestricted -Scope CurrentUser
```

jeronimoluza · 2024-08-22T13:39:56Z

I'm able to create the environment using Python3.12, but unable to do so using Python 3.6 – I get a lot of dependency conflicts.
When trying to do up.download.hdx_fb_population("uruguay", "full") with Python3.12, it doesn't work because of code compatibility issues:

Traceback (most recent call last):
  File "/Users/jeronimoluza/jl_repos/urbanpy/run.py", line 4, in <module>
    pop = up.download.hdx_fb_population("uruguay", "full")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeronimoluza/jl_repos/urbanpy/urbanpy/download/download.py", line 432, in hdx_fb_population
    population = get_hdx_dataset(resources_df, dataset_ix)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeronimoluza/jl_repos/urbanpy/urbanpy/download/download.py", line 390, in get_hdx_dataset
    return pd.read_csv(urls)
           ^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/envs/urbanpy/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/envs/urbanpy/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 620, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/envs/urbanpy/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
    self._engine = self._make_engine(f, self.engine)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/envs/urbanpy/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1880, in _make_engine
    self.handles = get_handle(
                   ^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/envs/urbanpy/lib/python3.12/site-packages/pandas/io/common.py", line 719, in get_handle
    if _is_binary_mode(path_or_buf, mode) and "b" not in mode:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/envs/urbanpy/lib/python3.12/site-packages/pandas/io/common.py", line 1181, in _is_binary_mode
    return isinstance(handle, _get_binary_io_classes()) or "b" in getattr(
                                                           ^^^^^^^^^^^^^^^
TypeError: argument of type 'method' is not iterable

jeronimoluza · 2024-08-28T13:50:04Z

Hi @Claudio9701,

I resumed this after a week, and I realized I was having a bad function call inside one of my testing scripts 🥲.
I tried using spatial intersections to filter the (lat, long) points coming from HDX that are inside the polygon/multipolygon mask, but the HDX Datasets are so large that it takes more than double the original time to return the dataset.

After playing around for a bit, I found that this next method (without generating shapely geometries for the lat, long points) is a possible solution:

    urls = resources_df.loc[ids, "url"]

    print(urls)
    if isinstance(ids, list) and len(ids) > 1:
        df = pd.concat([pd.read_csv(url) for url in urls])
    else:
        df = pd.read_csv(urls)

    if mask:
        if isinstance(mask, GeoDataFrame):
            mask = mask.unary_union
        minx, miny, maxx, maxy = mask.bounds

        df_filtered = df[
            (df["longitude"] >= minx)
            & (df["longitude"] <= maxx)
            & (df["latitude"] >= miny)
            & (df["latitude"] <= maxy)
        ]
        return df_filtered
    else:
        return df

The only problem is that is not as precise as when returning the points – the intersection method will return the points that intersect with the mask, while this "bounds" method will return all the points inside the extent of the mask.

What do you think?

Claudio9701 · 2024-08-28T17:24:48Z

That's awesome, thanks! I like that solution we can do the bounding box filter and then apply clip to the result.

Another option could be to read the geotiff instead of the csv from the hdx. Geotiffs can be queried directly when reading using rasterio.

I would say we close this issue with the first solucion (bounds+clip) and think if it's worth it to handle geotiffs later. I'm working on a function to bring FABDEM raster data into urbanpy

jeronimoluza · 2024-08-29T18:01:13Z

Just implemented and tested the bounds + clip solution! Sending a pull request now.
Exciting news about FABDEM! Let me know if I can help.

Claudio9701 added the enhancement New feature or request label Sep 20, 2023

Claudio9701 assigned Claudio9701 and unassigned Claudio9701 Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query population usign city boundaries to avoid loading all country data #30

Query population usign city boundaries to avoid loading all country data #30

Claudio9701 commented Sep 20, 2023

jeronimoluza commented Aug 8, 2024

Claudio9701 commented Aug 8, 2024

jeronimoluza commented Aug 21, 2024

Claudio9701 commented Aug 21, 2024 •

edited

Loading

jeronimoluza commented Aug 22, 2024

jeronimoluza commented Aug 28, 2024

Claudio9701 commented Aug 28, 2024

jeronimoluza commented Aug 29, 2024

Query population usign city boundaries to avoid loading all country data #30

Query population usign city boundaries to avoid loading all country data #30

Comments

Claudio9701 commented Sep 20, 2023

jeronimoluza commented Aug 8, 2024

Claudio9701 commented Aug 8, 2024

jeronimoluza commented Aug 21, 2024

Claudio9701 commented Aug 21, 2024 • edited Loading

Project Setup Instructions

jeronimoluza commented Aug 22, 2024

jeronimoluza commented Aug 28, 2024

Claudio9701 commented Aug 28, 2024

jeronimoluza commented Aug 29, 2024

Claudio9701 commented Aug 21, 2024 •

edited

Loading