Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipelining merge sorter does not recompute parameters after data structures #206

Open
Mortal opened this issue May 24, 2016 · 1 comment
Assignees

Comments

@Mortal
Copy link
Collaborator

Mortal commented May 24, 2016

When we introduced data structures to pipelining, we started calling set_available_memory twice on each node: a) Once before freezing data structures, and b) once after freezing data structures.

However, the pipelining sorter was not adapted to this, so it currently uses the memory assigned in a) to compute sort parameters and completely ignores b).

Fortunately, memory assigned to nodes in b) is greater or equal to that in a), so we don't risk memory overusage in the sorter, but we should recompute sort parameters when memory is assigned in b).

Probably we can compute all merge sort parameters in begin() of the first phase.

@tyilo tyilo self-assigned this May 24, 2016
@tyilo
Copy link
Collaborator

tyilo commented May 24, 2016

There is no easy way to fix this.

The problem is that when begin is called on the input node (the first of the 3 nodes in the pipeline), it has to call begin on the merge sorter. Sometime after this push is called on the input node which forwards it to the merge sorter. However after the push the calc node (the 2nd of the 3 nodes) is notified about how many resources it can use after the data structures has been frozen and so it can't change the parameters of the merge sorter, because it has started.

Two possible fixes:

  • Change how pipelining works
  • Allow calculating the parameters for the 3 phases in the merge sorter independently and only just before that phase starts. (Note: phase 1 memory usage is dependent on the fanout which is dependent on phase 2 memory and files usage, so this is probably not possible to adjust phase 1 memory usage)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants