Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download db dump #71

Open
kiliankoe opened this issue Jul 5, 2015 · 7 comments
Open

Download db dump #71

kiliankoe opened this issue Jul 5, 2015 · 7 comments
Assignees
Labels
Milestone

Comments

@kiliankoe
Copy link
Member

I'd say this feature is still a bit away, but I'm still going to list it here.

A great idea would be to offer up a dump (or something similar) of the archived database. It's definitely a good idea to store this dump somewhere should the database contents ever be pruned as well.

@kiliankoe kiliankoe added the db label Jul 14, 2015
@Mic92 Mic92 self-assigned this Aug 27, 2015
@jklmnn
Copy link
Member

jklmnn commented Feb 7, 2017

WIP. Some dumps are already available at https://parkendd.de/dumps

@jklmnn jklmnn added this to the 0.3 milestone Feb 7, 2017
@jb3-2
Copy link
Contributor

jb3-2 commented Feb 11, 2019

For a couple of days, the city of Basel is now also supported by ParkAPI. What needs to be done in order to have Basel historical data accessible in this data dump directory?

@jklmnn
Copy link
Member

jklmnn commented Feb 11, 2019

I usually manually dump the database into csv files with a script. Due to the disadvantageous database design, the databases size (which doesn't apply to Basel yet) and the script being Python dumping requires at least 16GB of RAM for the script alone so it can't be done on the server. So this is a process I usually do yearly. So basel would be available from 01/2020 by default but I think this isn't what you had in mind.

A monthly update would require an automated process on the server and a better structure of the dump directory. The smaller time frames are probably better suited for an automated process.
Unfortunately I'm not sure if I find time to implement this soon.

We could also think about adding an interface to pull historical data in small batches via the ParkAPI (after all this is what happens currently but only for the latest value).

@jb3-2
Copy link
Contributor

jb3-2 commented Mar 11, 2019

Currently, three Students of Uni Bern's Open Data course (https://opendata.iwi.unibe.ch/vorlesung) are interested in visualizing ParkAPI data. For them, getting historical data is crucial. Could you maybe run the dump job for Basel data so that at least some data is available for them?

Maybe there's a different way of running export jobs, e.g. via Postgres SQL commands without involving Python? I cannot imagine this would use such enormous amounts of RAM. Can I help here?

Ideally, the API would provide historical data. If this can be implemented before end of March 2019, that would be marvelous! "My" three interested students will start their work around that time.

@jklmnn
Copy link
Member

jklmnn commented Mar 11, 2019

I dumped the current data into http://parkendd.de/dumps/Basel2019.tar.xz. So you can get it there.

Well the RAM usage happens due to the python extraction script. I just tried to dump Hamburg on the server and Postgresql itself only require 1-2GB but as soon as the python script started to load everything into its own data structures (aka. json.loads) the memory usage skyrocketed. The script can be found here: https://gist.github.com/jklmnn/6a31994bdbeada12a82e7d1847802caf (maybe one could improve the fetchone calls but the I'd loose those neat progress bars).

The historical data API could be implemented via the timespan request that is currently used for forecasts. There are two things we have to take care of:

  • how are timespan requests handled that overlap past and future
  • timespan sizes probably need to be constrained to avoid denial of service

@kiliankoe seems to already have worked on this functionality, maybe he can provide an implementation we only need to adapt.

@kiliankoe
Copy link
Member Author

As Johannes mentioned the timespan requests currently only return forecast data, even for past time ranges, which is unfortunate.
The API should definitely be providing historical data and this is fairly easy to achieve with the proposed changes in #167. That PR is unfinished however and requires some more work, probably by me, which unfortunately means it can't be started before April.
It should be possible to extend the current implementation as well to return historical data for past ranges in the timespan request, but unfortunately for the same underlying reason as above I can't work on that personally before April 😕

For now I would recommend basing the work on dump Johannes provided. Maybe he can re-export this data at the end of March to provide the most amount of historic data we have of Basel so far.

By the way, how awesome is it that there's an Open Data course at the university of Bern‽

@jklmnn
Copy link
Member

jklmnn commented Apr 4, 2019

I'm sorry we still don't have an API for historical data. Since your started their work lately I updated the static dump at http://parkendd.de/dumps/Basel2019.tar.xz to include the data until today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants