Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt urbanpy tutorial for schools in Florianópolis #5

Closed
wavingtowaves opened this issue Dec 19, 2022 · 20 comments
Closed

Adapt urbanpy tutorial for schools in Florianópolis #5

wavingtowaves opened this issue Dec 19, 2022 · 20 comments
Assignees

Comments

@wavingtowaves
Copy link
Collaborator

wavingtowaves commented Dec 19, 2022

We will adapt the existing urbanpy tutorial for the city of Florianópolis as a first step to creating predictions on the much larger state of Pará

@wavingtowaves
Copy link
Collaborator Author

Update 2022-01-19

Hi all 👋 here's an update on the progress so far.

I've got a recently committed jupyter notebook focused on a spatial model for florianopolis. I've worked through over half of the tutorial, and in the process open up a couple of issues to help solve some data problems I ran into.

🚧 Blocker
@bitsandbricks I hit another blocker when working through tutorial information and could use your help.

When I try to run the line

es = up.download.overpass_pois(bounds=ba.total_bounds, facilities='education')

I get an error

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Not sure why this is, since the code is identical to the tutorial and when I run

ba.total_bounds

for florianopolis, I get an output of

array([-48.613    , -27.847    , -48.3585929, -27.379    ])

Which seems similar enough to what is output in the tutorial just without as many decimal places. Any thoughts on why this might be happening?

@bitsandbricks
Copy link
Collaborator

I'll take a look!

Rob, can you share the notebook/script you are running to use as a reproducible example?

@wavingtowaves
Copy link
Collaborator Author

Thanks so much @bitsandbricks!

The notebook is linked above and here. But I've set it up with pipenv files as well so if you clone locally and try out running the notebook on your machine.

Not sure if you've worked with GitHub codespaces much, but you could try running a codespace like I show below and see if everything in the notebook will run through the codespace. Let me know if this work well for you to test. 👍🏻

Screenshot 2023-01-19 at 3 56 11 PM

@bitsandbricks
Copy link
Collaborator

Ohh it was right in front of my eyes! Sorry Rob. I'll keep you posted

@bitsandbricks
Copy link
Collaborator

Alright, good news: it worked for me, running your notebook on a codespace.

image

Bad news is, I didn't do anything besides clicking on the cells so I'm not sure what caused your problem!

Lazy guess: maybe the OSM backend was having a bad day when you tried to downlaod data via urbanPy?

@Claudio9701
Copy link
Collaborator

Claudio9701 commented Jan 25, 2023

Same thing on my edge, hope this is not a common error.

urbanpy-florianopolis-education

Also as a side note, the new version of urbanpy (in the master branch for the moment) have a new more flexible function to download data from overpass. I did a fast example of how to query education facilities inside Florianopolis in this colab notebook.

@wavingtowaves
Copy link
Collaborator Author

wavingtowaves commented Feb 7, 2023

✅ Updates

Thanks to help from @bitsandbricks and @Claudio9701 I was able to make good progress on the model for florianopolis.

I created a count of the educational facilities in florianopolis (but see my question below)

Image

Also I created a map of the educational facilities in florianopolis. Next up is calculating the walking distances to educational facilities.

❓ Questions

  1. We have many different types of points of interest for educational facilities. Which should we include?
  • college
  • kindergarten
  • language_school
  • library
  • toy library
  • music school
  • school
  • university

For now I've selected: school, kindergarten, language school, and library. What do you think?

  1. Also in terms of age groups. We can download:
  • the entire population of brazil
  • children: children (age 0-5)
  • youth: youth (15-24)

What would be appropriate?

  • For @Claudio9701, metric='haversine' make the most sense for our goal here?

CC: @bitsandbricks @Juliavieiradeandradedias

@Claudio9701
Copy link
Collaborator

I think haversine distance (which considers earth curvature) is a good measure for considerable distances. I would say in cities with a big area or working at the country level. I usually use this one for Lima and other countries with large distances and irregular/incomplete road networks.

Other options for distance calculation are euclidean (the most naive) and cityblock (See Image below). These two work really well for small distances and cities with a regular road network.

image

@wavingtowaves
Copy link
Collaborator Author

I appreciate that explanation 🙌

Even though florianopolis is a smaller area, I still think haversine will work 👍

@wavingtowaves
Copy link
Collaborator Author

wavingtowaves commented Feb 7, 2023

Thanks to your help in #7, @Claudio9701. I was able to get the docker container spun up 🎉

❓ Follow-up questions:

  1. For some reason, when running

es = up.download.overpass_pois(bounds=ba.total_bounds, facilities='education')

I get the error:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Full error message below. What do you think might be happening here? Somethings happening with the JSON decoder, but I'm not sure how to resolve.

  1. Do I need to update the below code

distance, duration = up.routing.osrm_route(origin=point1, destination=point2) that you sent over or does it already know that origin is centroid (point1) and destination (point2) is school?

  1. With the docker container up and running, I'm getting this error:

Error: No such object: osrm_routing_server_south-america_brazil_sul_foot

Maybe this is due to some upstream issue with the JSON portion mentioned in my point 1 above.


JSONDecodeError                           Traceback (most recent call last)
File /opt/homebrew/lib/python3.10/site-packages/requests/models.py:971, in Response.json(self, **kwargs)
    970 try:
--> 971     return complexjson.loads(self.text, **kwargs)
    972 except JSONDecodeError as e:
    973     # Catch JSON-related errors and raise as requests.JSONDecodeError
    974     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError

File /opt/homebrew/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File /opt/homebrew/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File /opt/homebrew/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
...
    973     # Catch JSON-related errors and raise as requests.JSONDecodeError
    974     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 975     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

CC: @Juliavieiradeandradedias, @csmlo

@Claudio9701
Copy link
Collaborator

  1. Looks like the request to the overpass api is not receiving a correct response. Could you try to reach the overpass api with requests or curl to see if is something with the network?

  2. Yes it needs to be updated the steps are:

  • Generate hexagons and PoIs centroids
  • Calculate the nearest PoI to each hexagon (distance haversine)
  • Calculate the travel distance and duration from each hexagon to its nearest PoI (OSRM server)

This notebooks can help:

https://github.com/EL-BID/urbanpy/blob/master/notebooks/Creating%20an%20interactive%20webapp.ipynb

https://github.com/Claudio9701/urbanpy-brazil-demo/blob/master/Pop_Access_UrbanPy_Demo_BR.ipynb

@bitsandbricks
Copy link
Collaborator

Bravo Rob!

Back to your initial questions:

- We have many different types of points of interest for educational facilities. Which should we include?

Based on the OSM project definitions for their keys an values (here, I always have it around cause I keep forgetting the details :D) we want "school": "School and grounds - primary, middle and seconday schools"

This is a data layer that can definitely be replaced by an "official" list depending on specific needs (i.e only primary schools), but the OSM one will be fine for preliminary results

  • Also in terms of age groups [...] What would be appropriate?

In the same spirit, until we are asked for a specific range, we can go for the population in compulsory schooling range (ages 6 to 14 in Brazil). Eyeballing the population pyramid I'd say that's a little bit under 7% of the entire population. Of course, this already vague number will differ from place to place, and specially contrasting rural vs urban areas, but should be fine for a starting point. We can document the rationale and carry on!

@wavingtowaves
Copy link
Collaborator Author

@Claudio9701 thanks so much for the pairing session today 🌟

A few quick updates:

  1. es = up.download.overpass_pois(bounds=ba.total_bounds, facilities='education') gave me an error a few more times then it ran fine. Good to know that this will work sometimes and be buggy sometimes. At least I know to expect this 👍

  2. When I run the code below to start the local server, is the process supposed to end? We were getting an error right away this morning. I'll paste fuller error message below:

Image


Error: No such object: osrm_routing_server_south-america_brazil_sul_foot
latest: Pulling from osrm/osrm-backend
Digest: sha256:af5d4a83fb90086a43b1ae2ca22872e6768766ad5fcbb07a29ff90ec644ee409
Status: Image is up to date for osrm/osrm-backend:latest
docker.io/osrm/osrm-backend:latest
/bin/sh: line 4: wget: command not found
docker: Error response from daemon: Conflict. The container name "/osrm_extract" is already in use by container "6aae95e70200660f1c8d7a5d3f609caeba7cbe42ef3a7257361f9bd152e1c2da". You have to remove (or rename) that container to be able to reuse that name.

docker: Error response from daemon: Ports are not available: exposing port TCP 0.0.0.0:5000 -> 0.0.0.0:0: listen tcp 0.0.0.0:5000: bind: address already in use.
time="2023-02-09T13:02:43-08:00" level=error msg="error waiting for container: context canceled"

Error: No such container: osrm_extract
docker: Error response from daemon: Conflict. The container name "/osrm_routing_server_south-america_brazil_sul_foot" is already in use by container "2ce75c74142e442dd48ae9fca40443482527d419f2424e49a3e694cdd1576e07". You have to remove (or rename) that container to be able to reuse that name.
See 'docker run --help'.

@wavingtowaves
Copy link
Collaborator Author

wavingtowaves commented Feb 14, 2023

Quick update!

I figured out what was going on with the port error we were running into w/ docker @Claudio9701.

We did need to run lsof -i:5000 to see what programs were already using this port. I did some investigating and found out that it's related to airplay on macs 😓 Glad we got this cleared up.

Screenshot 2023-02-14 at 2 34 53 PM

I'm able to connect to the osm server using the original code in the tutorial 🎉 I do need to have docker open and running on my machine for it to work :

up.routing.start_osrm_server('sul', 'south-america_brazil', 'foot')

❓ for @Claudio9701, what's the typical run time for the.

Currently, I have h3_resolution set to 8, this is for the city of florianopolis. Is this too high? Right now the function below has run for 30 min. I could also trying spinning up a GitHub codespace and seeing if it runs faster on one of our servers. Let me know what you think.

distance_duration = hex_flor.apply(
    lambda row: up.routing.osrm_route(
        origin=row.geometry.centroid, 
        destination = schools.iloc[row['closest_school']]['geometry']
    ),
    result_type='expand',
    axis=1,
)

@csmlo
Copy link
Collaborator

csmlo commented Feb 15, 2023

CC: @bitsandbricks on above issue for visibility.

@Claudio9701
Copy link
Collaborator

Claudio9701 commented Feb 15, 2023

Hello Rob, great news you could spin up the docker container 👏🏼🙌🏽🚀!

Next version of urbanpy need to give the user the ability to choose on which port to run the osrm server.

Regarding the other question, resolution 8 should be good for a small city like Florianópolis. I usually run this function with tqdm so I can have an idea of how much time it will take.

from tqdm.notebook import tqdm
tqdm.pandas()

df.progress_apply(...)

If this is taking to much time, I've also used pandarallel to speed up the calculation.

from pandarallel import pandarallel

pandarallel.initialize(progress_bar=True)

df.parallel_apply(...)

This also have a progress bar that give you a hint of how much time the processing will take. Both are install using pip.

If it is still taking too much time you could filter out hexagons without population or with population bellow a certain threshold. But in my experience this it's almost never necessary.

Hope you find this useful!

@bitsandbricks
Copy link
Collaborator

Bravo Rob!

Back to your initial questions:

- We have many different types of points of interest for educational facilities. Which should we include?

Based on the OSM project definitions for their keys an values (here, I always have it around cause I keep forgetting the details :D) we want "school": "School and grounds - primary, middle and seconday schools"

This is a data layer that can definitely be replaced by an "official" list depending on specific needs (i.e only primary schools), but the OSM one will be fine for preliminary results

  • Also in terms of age groups [...] What would be appropriate?

In the same spirit, until we are asked for a specific range, we can go for the population in compulsory schooling range (ages 6 to 14 in Brazil). Eyeballing the population pyramid I'd say that's a little bit under 7% of the entire population. Of course, this already vague number will differ from place to place, and specially contrasting rural vs urban areas, but should be fine for a starting point. We can document the rationale and carry on!

Bump

@wavingtowaves
Copy link
Collaborator Author

👋🏻 Just so I'm extra clear on which step/how to do this estimation for school age children based on the population pyramid does this look right to you @bitsandbricks and @Claudio9701

pop_flor = up.geom.filter_population(full_pop_brazil_southeast, flor)
pop_flor['population'] = pop_flor['population'].parallel_apply(lambda x: x*0.07)
pop_flor.head()

@bitsandbricks
Copy link
Collaborator

It does to me!

@wavingtowaves
Copy link
Collaborator Author

I am going to close this issue for our Florianópolis model since we have a notebook that does this analysis in our repo.

I'll create a new issue for re-running this model with INEP's databases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants