-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manually control all newline characters #139
base: main
Are you sure you want to change the base?
Conversation
manually control all newline characters to reduce ambiguity for parenthesized expressions
(stmt_register | ||
(val_variable | ||
(identifier)))) | ||
(val_string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stmt_register: ($) =>
prec.left(
-1,
seq(
KEYWORD().register,
field(
"plugin",
choice(alias($.unquoted, $.val_string), $.val_variable),
),
field("signature", optional($.val_record)),
),
),
The new result is more accurate for the rule
I really don't have a problem moving forward with this because even if it breaks something, you've been super helpful to fix those type of things. If this is going to make things better in the long run, a little temporary instability is a small price to pay. |
@fdncred I think this is helpful to decouple intertwined rules and enable finer grained control of the grammar, on the other side, it forces us to write more intricate rules when newline characters are allowed in them. Let's wait to see whether other people @CabalCrow @mrdgo have different opinions on this, during the on-going test on a broader range of scripts. |
Back when I was attempting a rewrite, I was trying to put the newlines in scanner due to the same issue. There were plenty of problem with comments and all other types of syntax that is allowed by nushell. Do note that the bash tree sitter also had some of the issue, so maybe this could be a limitation of what you can do with tree-sitter without relying on the scanner. |
@CabalCrow Yeah, external scanner was my initial thought and probably is the ultimate solution. Yet it turns out that most of known errors can still be solved without a scanner, comments are definitely on the list of attention here. I feel that keeping it all js for now helps to recognize the parts that really requires a scanner, for example the unquoted string rule already looks insanely complicated. |
Thanks for the PR and the request for review. I honestly feel some pain looking at all the places where you now explicitly allow newlines. On the other hand, it's a great advantage to solve it in js. I'll take a closer look tomorrow. |
@@ -52,7 +52,28 @@ use $"('s' + 't' + 'd')" | |||
(pipeline | |||
(pipe_element | |||
(expr_binary | |||
(val_string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nested binary expression made left associative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the cleanup work alongside the newline control :)
[$.expr_binary_parenthesized], | ||
[$.nu_script, $._terminator], | ||
[$.pipeline], | ||
[$.pipeline_parenthesized], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate alphanumeric sorting!
What exactly do single-entry lists do, here? I thought of these as precedence rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For single-item lists, Treesitter will figure out how many times should each pattern repeat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't get it. You declare tokens that occur somewhere whithin a repeat()
, right? So pipeline
may repeatedly occur. What information is missing that treesitter learns from a conflict [$.pipeline]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quote the document of conflict:
conflicts - an array of arrays of rule names. Each inner array represents a set of rules that’s involved in an LR(1) conflict that is intended to exist in the grammar. When these conflicts occur at runtime, Tree-sitter will use the GLR algorithm to explore all of the possible interpretations. If multiple parses end up succeeding, Tree-sitter will pick the subtree whose corresponding rule has the highest total dynamic precedence.
The pipeline rule for example, has local conflicts with itself, at the newlines in the middle, it doesn't know whether there're more pipe elements ahead, thus a shift-reduce conflict.
You can try to remove the line and run tree-sitter generate again, it will tell you the detailed cases that cause the trouble.
optional( | ||
prec.right( | ||
seq( | ||
repeat($._newline), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this duplicate lines 172, 180, and 188?
(If those are the only references to $.parameter
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since $.parameter
is repeated, I feel it more clear that it only allows edge newlines on one side (here on the beginning). Shared newline across parameters probably wont cause any conflict though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope that tests wouldn't pass on conflict, I was just wondering. Consider it more a curious question than a request for change :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's a good point, I probably need to use a helper function for those kind of repeated patterns, somewhat like list_entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't really need a helper function, it doesn't really hurt to tell treesitter twice that there may be newlines between parameters.
While attempting to fix #122 , I found the ambiguity of newline characters is a big obstacle for multiline expressions to be parsed accurately.
Current grammar put newline characters in
extras
(\s
), allowing arbitrary numbers of them to be present freely between any nodes. While some rules specifically announced newline characters, making them selected with higher priorities when a newline is matched in the exact sequence.This makes adding the rule for
expr_binary_parenthesized
extremely difficult since newline characters breaks the precedence order.This PR did quite a lot changes:
val_record
as register signaturePlease take time to review it with caution.
I'll continue to test it on more nushell scripts to make it more robust.
If you guys have better solutions, I'm more than happy to switch.