-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download db dump #71
Comments
WIP. Some dumps are already available at https://parkendd.de/dumps |
For a couple of days, the city of Basel is now also supported by ParkAPI. What needs to be done in order to have Basel historical data accessible in this data dump directory? |
I usually manually dump the database into csv files with a script. Due to the disadvantageous database design, the databases size (which doesn't apply to Basel yet) and the script being Python dumping requires at least 16GB of RAM for the script alone so it can't be done on the server. So this is a process I usually do yearly. So basel would be available from 01/2020 by default but I think this isn't what you had in mind. A monthly update would require an automated process on the server and a better structure of the dump directory. The smaller time frames are probably better suited for an automated process. We could also think about adding an interface to pull historical data in small batches via the ParkAPI (after all this is what happens currently but only for the latest value). |
Currently, three Students of Uni Bern's Open Data course (https://opendata.iwi.unibe.ch/vorlesung) are interested in visualizing ParkAPI data. For them, getting historical data is crucial. Could you maybe run the dump job for Basel data so that at least some data is available for them? Maybe there's a different way of running export jobs, e.g. via Postgres SQL commands without involving Python? I cannot imagine this would use such enormous amounts of RAM. Can I help here? Ideally, the API would provide historical data. If this can be implemented before end of March 2019, that would be marvelous! "My" three interested students will start their work around that time. |
I dumped the current data into http://parkendd.de/dumps/Basel2019.tar.xz. So you can get it there. Well the RAM usage happens due to the python extraction script. I just tried to dump Hamburg on the server and Postgresql itself only require 1-2GB but as soon as the python script started to load everything into its own data structures (aka. json.loads) the memory usage skyrocketed. The script can be found here: https://gist.github.com/jklmnn/6a31994bdbeada12a82e7d1847802caf (maybe one could improve the fetchone calls but the I'd loose those neat progress bars). The historical data API could be implemented via the timespan request that is currently used for forecasts. There are two things we have to take care of:
@kiliankoe seems to already have worked on this functionality, maybe he can provide an implementation we only need to adapt. |
As Johannes mentioned the timespan requests currently only return forecast data, even for past time ranges, which is unfortunate. For now I would recommend basing the work on dump Johannes provided. Maybe he can re-export this data at the end of March to provide the most amount of historic data we have of Basel so far. By the way, how awesome is it that there's an Open Data course at the university of Bern‽ |
I'm sorry we still don't have an API for historical data. Since your started their work lately I updated the static dump at http://parkendd.de/dumps/Basel2019.tar.xz to include the data until today. |
I'd say this feature is still a bit away, but I'm still going to list it here.
A great idea would be to offer up a dump (or something similar) of the archived database. It's definitely a good idea to store this dump somewhere should the database contents ever be pruned as well.
The text was updated successfully, but these errors were encountered: