-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack-overflow in boost::read_graphviz_detail::parser::parse_subgraph #364
Comments
Without a reproducer I had to guess, but likely they test something like #include <boost/graph/adjacency_list.hpp>
#include <boost/graph/graphviz.hpp>
int main() {
boost::adjacency_list<> g;
boost::dynamic_properties dp(boost::ignore_other_properties);
read_graphviz(std::cin, g, dp);
} With input like:
Which does exhibit the stackoverflow: AnalysisFirst, let's observe that subgraph parsing is fundamentally broken.
Now, this state of affairs is somewhat "known", in that many aspects of graphviz reading have been broken for a while. See this commented block of code https://github.com/boostorg/graph/blob/develop/include/boost/graph/graphviz.hpp#L452-L491 In my StackOverflow answers I have, on more than one occasion, chosen to write a whole custom Graphviz parser to work around things:
Somewhere along the way I studied the graphviz grammar exhaustively and wrote a ditto Spirit parser to accomodate all the grammar¹ https://github.com/sehe/spirit-graphviz. My main take-away would be mostly that graphviz grammar is "okayish" but its semantics are incredibly idiosyncratic and basically "black box" - only the reference implementation Why do I tell you this? Solution DirectionBGL doesn't document the subset of graphviz grammar they promise/wish to support. What's more, they do not document which semantics they give the result of a succesful parse. Add to this the mildly surprising fact that even the simplest of subgraphs does not parse correctly with current mainline I want to
If the answer to the above is overwhelmingly negative, I'd suggest to scrap the feature unless we can find sponsor to fix the implementation. It will be crucial to come up with the minimal useful set of grammar and semantics to support, because as sketched out above the graphviz feature set is vast and intransparent. Good thing is, likely nobody uses that? Risk AssessmentThe risk only applies to applications that allow BGL to parse graphviz input from untrusted sources. I would assume that the number of applications that this applies to is small, and likely to be found in academia or other research environments, where the resulting crash would not disrupt critical services. The state of the parser features makes it ostensibly hard to rely on for production code, which hopefully further reduces the number of applications exposed. I wouldn't go as far as to say that the risk is "low". To me it makes it a "medium" risk. I'd certainly discourage use of the read_graphviz feature in production code until mitigations are in place. Mitigation
In any other context I'd summarize it as regular "paying off technical debt" (perhaps more like "cutting losses" in his context?) ¹ (except treating HTML attributes) |
I note that you did not include replacing the existing implementation with your Spirit-based parser; why not? If there are existing unit tests for Otherwise, I would prefer to remove the broken code so that we get reports of missing features rather than reports of security vulnerabilities and crashing programs. |
And btw, thank you for the excellent and timely report. |
So... Apart from being a bit sick, I spent inordinate amounts of time to try to
It being almost intractable due to all things changing since 1.40.0. I ended up
14 is the number of revisions between 1.40.0 and now. Ironically (?) when defining
They all passed in all revisions...: old-spirit.zip
So the good news is, nothing seems to have regressed. The only people RELYING All this is starting to beg the question why the parser was replaced. I'd
I have not delved into issue history to see the "why's" but if you're so inclined, I imagine there will be something relevant around the time of this commit:
Side-Topic: Why Not Replace It Again?Re: "Why don't we use your Spirit implementation":
Simply put, I don't want to push my pet project. To me it looks that this is Next StepsNow that we've found that keyword-less subgraphs never worked, and people
|
In case you want you can follow my work/progress on here https://github.com/sehe/graph/tree/sehe-364 - assuming you agree with my plans unless you object here :) (and thank you for contributing 8dd8cd9 the .clang-format; it really puts my mind at ease trying to figure out the law of the land!) |
Oh look what I found spelunking the history at 8dd497a |
Thanks for your work. I'm in the mountains this weekend, so little time to look. My feeling is to drop support for keyword-less subgraphs first, to fix the bad behaviour? Then we can decide what to do next at leisure. |
On second thought, if the new parser works with the subgraph keyword, then it might not be any more difficult to make it work for implicit subgraphs too than it is to remove the support altogether. I had a quick look at your working branch and just want to make one request: please keep refactoring changes and behavioural changes in different pull requests. I value both, but they don't belong together. |
On Wed, Apr 24, 2024, at 1:52 AM, Jeremy W. Murphy wrote:
On second thought, if the new parser works *with* the subgraph keyword, then it might not be any more difficult to make it work for implicit subgraphs too than it is to remove the support altogether.
You absolutely read my mind! I was going to say the same thing, but probably only after I found the fix, because I can be a bit pig-headed like that.
I had a quick look at your working branch and just want to make one request: please keep refactoring changes and behavioural changes in different pull requests. I value both, but they don't belong together.
That's the reason it's not a PR. I admit naming the branch that can raise the wrong expectation :) In all honesty, the whole unit test refactor might be the "[build] one to throw away". Oh, and of course we'll need a separate PR for the actual fix. You know, once the parser does what it was supposed to, we **need** to have limits - because even if you remove the recursion (just iterative descent parser, I guess? lol) you will still want to guard against malicious payloads eating up precious resources.
Thanks for chiming in with the comment, and I'll show my work when I reach a next step!
…
—
Reply to this email directly, view it on GitHub <#364 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACPEALNHSHXBLTO6ETBTLLY63X5DAVCNFSM6AAAAABDOQHUJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZTGY3TGMJUGM>.
You are receiving this because you commented.Message ID: ***@***.***>
|
PS. I'll create a new issue for the CSR graph problem as well, but no hurry I guess. It's not a defect [the docs don't actually promise validity of the bundle propertymaps after graph mutation and people with an understanding of CSR models will probably intuit the (lack of) guarantees, much like I did]. I consider this a QoL side-catch if I find a way to improve it beyond the current kludge to fix the unit test. |
Wait, are we both missing an obvious temporary solution to the security problem, which is to simply switch the default implementation back to the original Spirit-based one? |
On Sun, Apr 28, 2024, at 9:52 AM, Jeremy W. Murphy wrote:
Wait, are we both missing an obvious temporary solution to the security problem, which is to simply switch the default implementation back to the original Spirit-based one?
That would potentially break everyone. The feature sets are not the same.
Side question: what is the official c++03 support status for BGL these days?
|
Oh, OK. C++03: Not supported. Thanks for reminding me to update the metadata. |
Great. That simplifies and might actually give me some reason to consider contributing parser features currently missing (in the further future). (Oh and sorry for the mess of typo's. I type my previous reply on a phone and I did not notice how badly that went :|) |
Non-keyword graphs never worked (!). This was uncovered because of security issue boostorg#364. parse_subgraph() incorrectly dealt with first_token in the case where the `subgraph` keyword wasn't used.
Non-keyword graphs never worked (!). This was uncovered because of security issue boostorg#364. parse_subgraph() incorrectly dealt with first_token in the case where the `subgraph` keyword wasn't used.
Non-keyword graphs never worked (!). This was uncovered because of security issue boostorg#364. parse_subgraph() incorrectly dealt with first_token in the case where the `subgraph` keyword wasn't used.
The fix is in: sehe@1636d00 Note that the other commits contain the accompanying tests. I made them on top of the refactored If that's okay, I'll PR it PS. I still have to make a general recursion limiter of course, to actually fix the current issue, let's not forget about that |
A simple sketch of how a I'm not a big fan of making the cut-off point configurable. After all, if you bake it into the compiled library using a preprocessor define, you effectively defer the value to some distribution packager, which might lead to programs having different limits depending on the system they run on, or even breaking programs when the packaged library changes their configured limit. On the other hand making the limit fully dynamic risks reintroducing the attack vector when programs don't clamp the value to some safe range. I'm open to your suggestions here. |
Wonderful, thank you. Looks fine exactly as is. I do prefer making things configurable, but I'm fine with making it hard-coded on a first pass. |
Oh that's easily done. There will just be a dependency because the new tests use the refactored driver. I'll go make some PRs and also file the issue analysis on the CSR problem that i worked around in the previously disabled test. |
Non-keyword graphs never worked (!). This was uncovered because of security issue boostorg#364. parse_subgraph() incorrectly dealt with first_token in the case where the `subgraph` keyword wasn't used.
I broke it down into these PRs (in order): |
Thank you and apologies for not responding sooner. I would prefer that, instead of leaving With regard to |
I'll make some of these basic changes in situ now so that I can merge them, just in case you get held up. So you'll have to pull these branches before pushing again (if you need to). |
Non-keyword graphs never worked (!). This was uncovered because of security issue boostorg#364. parse_subgraph() incorrectly dealt with first_token in the case where the `subgraph` keyword wasn't used.
The comment actually applied to the entire block of includes following that comment.
I agree about the username. The comment actually links to the issue that I did already create? I vote to keep warning block of "naive" code, and my commit message already gave the rationale for the importance. I'll address these. Removing unused headers belongs with the "refactoring graphviz_test.cpp" PR, so I'll do it there. |
Fix security issue #364 and non-keyword subgraph parsing
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=66719
The last four lines repeat in a cycle until the stack overflows.
The text was updated successfully, but these errors were encountered: