-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Service Abstraction to manage RAWR tiles #245
Comments
Yup, pretty much exactly! It does assume that there's knowledge of the tile pyramid upstream, but maybe that's ok? It also implies knowledge that the tile would represent the pyramid at some zooms but not the pyramid for others though.
Were you thinking of creating the data for the tile pyramid (the data fetcher part) concurrently? |
I was just thinking that the same object could suffice for all the worker threads, and remove the need to download and process it multiple times. Reading through the source code, I'm wondering if perhaps it makes sense to hide the That would mean loading the RAWR tiles on-the-fly and handling the locking manually, which adds complication. But might be worth it, rather than moving big chunks of code around. What do you think? |
Hhhmm, or maybe we introduce a new step in the chain that gets lifted up into a top level abstraction? Even with the existing postgresql query fetch, we have a step that converts the result of the query into data that is suitable for processing, ie from per table results into per layer groupings. We could mirror that with the rawr tiles pipeline, where we have a step that converts the results of the query, ie the payload from s3, into individual tiles for processing. It seems like this conversion would need to grow an awareness that it gets passed a list of tile coordinates, but that seems like it's probably ok to me. On the plus side, it already happens in "processing space" in terms of the system. It also feels like we have the different parts of the system having the right concerns this way, ie the data fetch just knows that it needs to get data for a particular region, and then we have downstream processing that extracts out smaller portions of that for data processing. This might prove useful if we ever want to add buffering back into the mix (maybe, hard to tell without having working code for it already). More specifically, the data fetcher step for rawr tiles could just yield the s3 specific data. And then this new step, which we could call something like DataTransformer, DataConverter, or DataProcessingPreparer, is what would happen first in the processing chain. It would take the result of the data fetch, and then be responsible for creating the data shape that processing understands. For rawr tiles, this step is what would create the index first, and then extract out the individual tiles for processing. Anyway, I think it's worth doing the code refactoring, because the existing data conversion step was more like temporary glue to prepare for rawr tiles, and rather than add a new layer of that to hold everything together, with some caching mru scheme that might have its own set of issues, altering the pipeline to handle it more directly seems like it would be a better option longer term. |
@zerebubuth Is this complete as part of the RAWR tiles sprint from 2017Q4? |
Yes, close enough. In |
@rmarianski said in a comment:
Yup, that sounds like a good idea. I was thinking that it would parallel the database pool, but I think it would need to be different. Database connections can't be shared between threads/processes, so the pool would need to hand out one each, whereas the RAWR data fetcher (once indexed) is read-only and thread-safe.
Would this be the kind of interface you were thinking of?
And there'd be a similar one for
DatabaseFetcherFactory
(which ignores thetile
parameter), and a third which switched between the two depending on thetile.z
parameter?The text was updated successfully, but these errors were encountered: