Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: Sweep line misses event to be removed #17

Open
daef opened this issue Apr 7, 2020 · 26 comments · May be fixed by #28
Open

panic: Sweep line misses event to be removed #17

daef opened this issue Apr 7, 2020 · 26 comments · May be fixed by #28

Comments

@daef
Copy link

daef commented Apr 7, 2020

When aggregating some simple (convex, <6 vertices) polygons via repeated unions I get this panic:

thread 'main' panicked at 'Sweep line misses event to be removed', src/main.rs:1:1

I managed to create a .geojson fixture that provokes the panic:

The black polygon is my aggregate, the orange one the one I want to union.
Nudging the marked vertex unpanics the situation.

image

EDIT: added image, added fixture, more informations

@bluenote10
Copy link
Contributor

Thanks for the detailed issue description!

Without having started any debugging, this is most likely exactly the issue I'm working on currently. The background is: Internally the line segments of the sweep line are stored in a tree. When neighbouring line segments intersect or overlap, they get subdivided. This mutates the segments that are in the tree, and because of numerical instabilities the order can change. This is something that trees don't like at all ;). Think of a tree storing the values 1 < 2 < 3 and the value 1 gets mutated to 2.5 < 2 < 3. Now searching the tree for 2.5 fails.

A solution that doesn't degrade performance is a bit tricky, but I have an idea. It will just take a little bit, because it probably requires to move away from the splay tree to a customized data structure I'm currently working on (a kind of a fixed-2-level B tree that allows to shuffle sort in these situations).

@daef
Copy link
Author

daef commented Apr 8, 2020

Would it be possible to consistently break ties like that (same starting point, same direction) by another measurement (e.g. squared length, x component, ...)?

@bluenote10
Copy link
Contributor

Yes, such a mechanism is part of the solution. The problem is that the situation can change from before and after computing overlaps/intersections:

  • Before: The shorter black segment is -- let's say -- above the orange segment, because the black endpoint might be slightly above the orange line. Note that floating point values form a sort of grid, and the black point might be the closest approximation of being "on the line" but still being technically "above the line".
  • After: When computing the overlap we have to live with what the floating point grid can offer. The orange segment gets subdivided at the black point (technically it has a slight kink now due to being forced to the grid). Now the black segment and the first part of the orange segment are perfectly on top of each other, which means that they now get sorted according to tie breaking rules. If the tie breaking rules put the orange segment on top of black, the order has changed. Now the subdivided orange segment is in a wrong position in the tree, and the algorithm may fail to find it.

@daef
Copy link
Author

daef commented Apr 21, 2020

While using a release-build and ignoring the missed events works out often, I also get results like this:

the union of yellow and green:
image
results in the orange and the red polygon (dashed being an 'interior')
image
overlayed it looks even worse
image

but trying to (re)create this with a fixture it magically works:
image

bug_cannot_reproduce.zip

edit: I finally found a difference: I use f32, the tests use f64. maybe it would be nice to be able to explicitly create f32-fixtures or test all fixtures with f32 and f64...

@bluenote10
Copy link
Contributor

edit: I finally found a difference: I use f32, the tests use f64. maybe it would be nice to be able to explicitly create f32-fixtures or test all fixtures with f32 and f64...

Ah good that you could make sense of it.

I'm still undecided if it is a good idea to make the library generic in the underlying numeric type. If you look at the (most robust?) polygon clipping library Clipper, they have "solved" the problem by allowing only integers in a fixed range -- not even float support. Since I'm not aware of a library that is generic and achieves a decent level of robustness I initially thought the problem may be too tough for generics and restricting to a single numeric type may be necessary. Currently I'm a bit more optimistic: I think I understand the problem in the issue above and I have an idea how to solve it. If that's the case, the question whether the test cases are in f32 or f64 shouldn't matter much, because I should have equivalent f64 test cases.

@daef
Copy link
Author

daef commented Apr 23, 2020

I thought the bigger numerical instabilities arising from smaller floats call for more robust code in the library - therefore it should be beneficial.

eg dont calculate intersectionpoints - but reference the lines which intersection is meant to be the point + geometric predicates, or maybe one could use symbolic perturbations a'la Magalh ̃aes in 10.1145/3139958.3140001 to decide. Or maybe throw memory at the problem and remember the originally calculated angle with the subdivided segment (so there's no need to recalculate a new - slightly off - angle)

You made me curious - how do you plan to solve this?

@bluenote10
Copy link
Contributor

I thought the bigger numerical instabilities arising from smaller floats call for more robust code in the library - therefore it should be beneficial.

Yes true, the issues will show up earlier, but they still fall into the same categories. Currently it simplifies the test code quite a bit by not having to write everything generically there as well.

eg dont calculate intersectionpoints - but reference the lines which intersection is meant to be the point + geometric predicates, or maybe one could use symbolic perturbations a'la Magalh ̃aes in 10.1145/3139958.3140001 to decide. Or maybe throw memory at the problem and remember the originally calculated angle with the subdivided segment (so there's no need to recalculate a new - slightly off - angle)

I already went down that rabbit hole of "higher precision intersections" ;). As a baseline I was using an arbitrary precision library: This makes intersection computation roughly a factor 100 slower, and since there aren't any pure Rust libraries, it is maybe not ideal. The best solution I came up with is this. This basically takes Shewchuk's idea behind his "adaptive" non-overlapping representations (used in "robust predicates" implementations), and limits them to just the second order, i.e., whereas the original adaptive representations can take much higher orders (=> slow like the full arbitrary precision library), the implementation I went for always reduces the results back to second order (=> stays relatively fast). In my experiments having two f64 to represent the intersection was always sufficient to get exactly the same result as arbitrary precision, but it is probably possible to find edge cases where also that isn't sufficient any more. Performance wise it was pretty okay: only a factor ~10 slower than the pure f64 algorithm. If it turns out higher precision for intersection is needed, I'd revive this idea.

You made me curious - how do you plan to solve this?

The "sweep line misses event to be removed" issue only has to do with the fact that the elements in the tree get mutated in a way that messes up the ordering. The obvious fix is to "re-sort" the elements if a mutation happened. Note that not a full sort is needed -- the fluctuations from mutations are tiny, so typically we'd only have to swap like two elements to repair the order again. As far as I can see, this is just a bit costly/ugly to do with trees though, because next/prev operations are O(log N), and bubble sorting them is a bit of a pain. I have created this special data structure for that purpose: It has O(1) next/prev and should be fairly straightforward to bubble sort, while still scaling reasonably well to the sizes of common sweep lines.

@anlumo
Copy link

anlumo commented Jun 3, 2020

Are there any updates on this? We're currently in the final strides of finishing a release of our software, but we're blocked by this crash.

Is there anything we could do to help fixing this bug?

@bluenote10
Copy link
Contributor

Yeah, good question. Basically this issue is a just a manifestation of a fundamental issue of the approach, and resolving it probably requires quite some effort. My plan was:

  1. Implement a data structure that is better suited for in-place rank modifications. This is finished in rust-array-stump.
  2. Switch the algorithm from the splay tree to this new data structure. This is prepared in this branch. The good news: This also brings a nice performance boost. The bad news: It breaks an existing test (probably a flaky one), so it is not easy to bring that back on its own.
  3. Add functionality to rust-array-stump to support "rank fixing", e.g., bubble sort after elements in the set have been modified in a way that violates the ordering. From here on things are still todo.
  4. Find the right place to integrate this rank fixing in the algorithm. There are a few options where and how to fix the order, and it is tricky to see where it needs to be done to lead to correct behavior.
  5. Adjust the logic of compare_segments to properly account for the rank changes. (I think I had started with this in this branch initially, but this is outdated now and only lead to the realization that it can't be solved without fixing the tree mutation issue first).

Unfortunately, the steps are all tangled up to some degree, e.g. the breaking test blocking 2 probably needs the adjustments to compare_segments. The combined fix would probably turn into a fairly big rewrite, requiring some days of really deep debugging. Since my project depending on rust-geo-booleanop is low priority at the moment, I couldn't really motivate myself for such a deep dive ;).

@anlumo
Copy link

anlumo commented Jun 7, 2020

Is there some kind of quick fix that can be applied until the real fix is implemented?

@bluenote10
Copy link
Contributor

If you are in a debug build and just want to get rid of the panic, you can comment out the debug_assert. In a release build there should be no crash, but most likely the algorithm result is broken.

@daef
Copy link
Author

daef commented Aug 18, 2020

I've implemented rank fixing for the new data structure and the respective calls there in subdivide_segments after possible mutations of the events. This all happens in a pretty naive(=unconditional) way, so I expect the performance to be suboptimal at the moment.
I guess that the call could at least be skipped if possible_intersection returns 0
(What would be a better type to return for possible_intersection instead of u8? Maybe an enum?)

It could possibly be even faster to entangle the code even more and put that rank_fixing burden into possible_intersection somehow.

Right now I'm pretty happy that I've not broken any test-cases.

Maybe @anlumo could try to integrate this into his release last minute. jk scnr

On a more serious note I'd love to know how to benchmark the performance. I've just stumbled upon cargo bench and am playing with it a bit, but I get many warnings (too few samples in given 3secs) and I have no idea what I'm really doing :)

Edit: posted prematurely due to accidental CtrlReturn

@bluenote10
Copy link
Contributor

On a more serious note I'd love to know how to benchmark the performance. I've just stumbled upon cargo bench and am playing with it a bit, but I get many warnings (too few samples in given 3secs) and I have no idea what I'm really doing :)

You can first checkout the master branch and run

cargo bench --bench benchmark -- --save-baseline master

there. Then switch to the feature branch and run

cargo bench --bench benchmark -- --baseline master

Then you should see the relative impact on each benchmark test case. Typically you can ignore the benchmark outlier warnings. Criterion (the benchmark tool) is a bit verbose in that regard. When in doubt, re-run the benchmarks (on my machine I'm actually observing quite some fluctuations from run to run and stopping background processes can help).

@anlumo
Copy link

anlumo commented Aug 18, 2020

Maybe @anlumo could try to integrate this into his release last minute. jk scnr

We've already fixed the issue by switching to geo-clipper, which is pretty much a drop-in replacement with the advantage of not crashing.

So, we're not in a hurry for this fix any more, although I'd prefer a pure Rust implementation if it's equivalent.

@daef
Copy link
Author

daef commented Aug 19, 2020

OT: I should probably include (geo-)clipper in my comparison. How does it compare performance wise?

@anlumo
Copy link

anlumo commented Aug 19, 2020

It feels faster than geo-booleanop, although that might be related to the C++-library being built with optimizations even for Rust debug builds. I don't have any objective numbers, though.

@anlumo
Copy link

anlumo commented Nov 14, 2020

Has there been any progress on this issue in the last 3 months?

@bluenote10
Copy link
Contributor

Not from my side (my use case is on hold).

@daef
Copy link
Author

daef commented Nov 24, 2020

my contribution should be pretty ready to merge, at least it works for me :)

I'm using the bluenotes arraystump instead of the splaytree successfully, with better performance and less bugs, but still not completely free of any 'sweepline misses event's, but didn't care to debug further as it's good 'nuf for the moment (for me)

edit: clarified what I'm talking about

@rmanoka
Copy link

rmanoka commented Feb 15, 2021

I tried to debug this issue. It seems like this line of fn compare_segments is always returning false if the contour_ids are also the same (that is the compared events are extremely identical).

In this case, this ends up confusing the splay tree (or even a BST) because the Ord guarantee is no longer true. For such two events a, b: both compare_segments(a, b) and compare_segments(b, a) would return Greater. One possible fix is to, as a last resort, compare using ptr addresses. I tried adding this, and the panic goes away (and all existing tests also pass). Does this fix the issue? Please see attached PR

@bluenote10
Copy link
Contributor

Does this fix the issue?

I don't think so, because the order in the tree is fundamentally messed up, i.e., the tree no longer maintains its invariant (a <= b in agreement with the tree structure). What you suggest may indeed fix a particular test case, because it doesn't panic immediately, but as far as I can see, it shifts the trouble to a later point: Subsequent operations on the tree assume that the invariant holds, but it is violated. So any follow-up search / insert / splay operation could fail. In my opinion the "rank fixing" logic as proposed above is needed to properly maintain the invariant of the data structure. And since this is very costly to do with a splay tree I proposed to use these "array stumps" which can be re-sorted much more efficiently.

@rmanoka
Copy link

rmanoka commented Feb 15, 2021

@bluenote10 In another local branch, I tried to fix the splay tree mutation issue. The mutation relevant to tree-ordering seems to happen only in divide_segment by this call to set_other_event, and I fixed it by passing a mut-ref of the splay tree itself to this function, and by removing the event from the tree, mutating, and then re-inserting it into the tree. That should solve the key mutation issue (as it is not being mutate inside the tree), and the complexity still remains O(n log n).

However, the above mentioned bug still remained, and it was still causing the panic on the test case mentioned here. I think the fix to compare_segments is important, as otherwise the comparison itself is flawed. I didn't add the tree-mutation fix in the PR, because it seemed unnecessary. Do you have other test cases that panic / fail? I could try testing locally and see if it helps.

@rmanoka
Copy link

rmanoka commented Feb 15, 2021

fwiw, this branch fixes the mutations as explained above. It includes the fix to comparator too; removing that brings back the panic. Yet to run any comprehensive benchmarks, but I'm also not a big fan of the SplayTree; does it perform poorly compared to say vanilla BTreeMap ?

@bluenote10
Copy link
Contributor

Do you have other test cases that panic / fail?

I vaguely remember that I had a few more test cases prepared related to some open issues, and I never could get them all to work. Unfortunately, quite some time has passed, and I cannot remember the details anymore. I still have several test cases lying around as untracked files in my working dir -- some of them were uncommitted because I couldn't get them to work. Here they are if it helps: uncommitted_test_cases.zip

I fixed it by passing a mut-ref of the splay tree itself to this function, and by removing the event from the tree, mutating, and then re-inserting it into the tree. That should solve the key mutation issue (as it is not being mutate inside the tree), and the complexity still remains O(n log n).

Yes, I considered that as well, but benchmarks didn't look good, because it's quite an overhead to remove+insert on every subdivision. But perhaps for now that is the best option. Good performance isn't worth much if the algorithm isn't correct.

However I was also never 100% sure if this actually gives full correctness, because if divide_segment does a remove+insert the neighbor relationships can change during the course of an iteration, i.e., the role of index_prev and index_next in main subdivide function can change as a side effect of the possible_intersection call. That always seemed dangerous. What I had in mind was to have a "rank fixing" functionality in the array stump function that you can give some additional references/indices and if the rank fixing needs to mess with the order of elements, the passed in references/indices get modifying accordingly so that they maintain their neighbor semantics. But again, I'm largely guessing at the moment because of vague memory.

Yet to run any comprehensive benchmarks, but I'm also not a big fan of the SplayTree; does it perform poorly compared to say vanilla BTreeMap ?

Yeah, unfortunately the SplayTree is relatively slow, see my benchmarks here (expect for the "find recent" case which they are optimized for, but that doesn't seem to be very relevant in the Martinez context).

@rmanoka
Copy link

rmanoka commented Feb 16, 2021

I added all the tests the zip file. They all seem to pass now (had to tweak the equality check in tests to allow for permuted but equivalent components of multi-polygon, polygon-interiors, and allow shifts in polygon rings).

Interestingly, this is still without the mutation fix. I suppose with arbitrary precision, divide_segments does indeed not mutate the ordering of the tree, but I'm not sure why it works with f64. It is possible that we haven't found the right counter-example for the mutation issue; will try playing around with f64::nextafter and see.

I suggest we merge the #28 PR as it fixes an independent bug.

@daef
Copy link
Author

daef commented Feb 18, 2021

to me it seemed that using f32 instead of f64 brought up various issues that f64 also had - but much earlier.

mockersf pushed a commit to vleue/polyanya that referenced this issue Mar 9, 2024
* swapped out geo-booleanop through geo-clipper, because of issue 21re/rust-geo-booleanop#17

* made clippy happy

* added mutually exclusive feature-flags "wasm-compatible" and "wasm-incompatible"
the wasm-compatible variant includes the lib geo-booleanop that is currently in some cases not reliable
the wasm-incompatible variant includes the lib geo-clipper seems to be more reliable, faster but not wasm-compatible

* cargo fmt
mockersf pushed a commit to vleue/polyanya that referenced this issue Mar 9, 2024
* swapped out geo-booleanop through geo-clipper, because of issue 21re/rust-geo-booleanop#17

* made clippy happy

* added mutually exclusive feature-flags "wasm-compatible" and "wasm-incompatible"
the wasm-compatible variant includes the lib geo-booleanop that is currently in some cases not reliable
the wasm-incompatible variant includes the lib geo-clipper seems to be more reliable, faster but not wasm-compatible

* cargo fmt
AnAverageGitUser added a commit to AnAverageGitUser/polyanya that referenced this issue Jul 17, 2024
using a new method to split up bounds
fixed line crossing that could not be handled by the CDT
fixed some CDT edges that were wrongfully determined as constrained edges
added 3 tests
locking dependencies for tag
rename as_navmeshes as as_navmesh
rename add_obstacle as queue_subtract
adding wasm-compatible and wasm-incompatible feature flags
using a few small changes from a fork of bvh2d (earlier error detection)
separate navmesh islands should now be possible
using different approach for passable/unpassable terrain (obstacles): using addition/set union and subtraction/set difference for more fine-grained control over the mesh
trying to make holes (unpassable polygons) possible in the navmesh
swapped out geo-booleanop through geo-clipper, because of issue 21re/rust-geo-booleanop#17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants