Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp loop indentation #642

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

Conversation

Thuna-Cing
Copy link

The goal of this PR is to indent CL:LOOP forms by parsing the body instead of using regexps on the surrounding context.

This is done in two steps:

  1. The new function sly--lisp-indent-loop-macro-partial-parse takes the current LOOP body and parses it, collecting each form in it as a cons cell (POS . OBJ) where POS is where the object starts and OBJ is the object's string representation as it is written in the buffer if it is an atom and nil if it is a compound form.
  2. The function sly--lisp-indent-loop-macro-1 was completely rewritten such that it uses a consume-forms sub-function in a cl-labels which recursively moves along this ((POS . OBJ)...) that we obtained before. It does this according to the spec given to it, the documentation of which is provided in the new variable sly--lisp-indent-loop-macro-allowed-bodies-alist, which also contains the specs of all clauses in a LOOP body. The goal was to make it be as one-to-one with the spec as it could be.

Some explanation of how consume-forms works: It receives four things: the current ((POS . OBJ)...) list (anything that came before we "moved across"), the spec that it is expecting, the column the current body begins at, and the offset relative to it. If it manages to parse until we reach the end of the forms, we return the current body plus the offset relative to it. If it parses it, finishes, and we still have left over forms, then it instead returns those forms. The sequence specs are such that the next subspec will be matched against these leftover forms instead. The entry point, the first consume-forms that is called is one that will match against clauses indefinitely. consume-forms may also return nil if it fails to match. (This is because the list ((POS . OBJ)...) is actually made to be terminated by a t and not nil.) All of these should guarantee that we indent the target line appropriately.

Now, a couple (or more) problems with the current state of this code:

  • It is slow. While this is not noticable with all calls, some calls are very obviously delayed. A simple paredit-reindent-defun is a great showcase of this problem. I suspect pulling out consume-forms into its own function will help with that, however some optimizations may need to be done even then.
  • sly--lisp-indent-loop-macro-allowed-bodies-alist is... fine, actually. It is by no means perfect, and I had to make some compromises in order to avoid having to bookkeep a bunch of stuff, but it is for the most part understandable. Some work on it may be done, if better ways to shape it are found, but it is decent.
  • sly--lisp-indent-loop-macro-partial-parse uses naive-ish code to determine what is and isn't a compound form, taking into account the standard reader macros. Some tests should be done to ensure that these work. Some of the more exotic cases, in case too difficult to handle, I think may be ignored, but as long as this code works then so should sly--lisp-indent-loop-macro-1, as far as I can tell.

TODOs:

  • sly-lisp-loop-body-forms-indentation should be made to be recognized in the spec, likely by changing the existing 2-indents by &body-indents. This brings up the issue of what the current 4-indents should be indented as, however. Possibly as a &special indent which defaults to double the &body-indent. There are also 0-indents, and while these can stay as hard-coded to 0, maybe a &join-indent could be introduced which would default to 0 instead. I don't know at this point if this is good or not, but some sort of a decision needs to be made.
  • sly-lisp-loop-indent-body-forms-relative-to-loop-start should... not exist. I have no idea why it even exists, and I have no appreciation whatsoever for it. I vote for simply removing it altogether. The implementation, if I have to allow for its existance, and in fact, the whole conceptual basis for the code is made void by this variable. I have no intentions to add support for it.
  • The terminating closing parenthesis is not indented appropriately when it is on its own line. I do not yet know why, it's something that I need to look into.
  • You will notice that I have changed some of the tests so that dos bodies are compound forms. This makes implementation possible and establishes consistency with the spec. The test indent-11 is also invalid, due to the when here's something I used to botch line. If some other invalid examples exist, they should either be fixed or explicitly be made examples of invalid loops. I can go through them, but then we move on to the next point:
  • Currently sly--lisp-indent-loop-macro-partial-parse only parses up to the form to be indented. This has two benefits:
    1. It is in theory safe; we cannot have any incomplete sexps until the form we are trying to indent, otherwise we would not be indenting LOOP but a different sexp instead. This seems to work so far but a more rigorous check is needed at some point.
    2. It allows for a very simple way to determine whether we are done with the recursive consume-forms or not: If we are at the end of the list of forms then we have figured out the indentation.
      This has a single problem however: The form
      when foo do
        (bar)
    
    should be indented as above if this concludes the clause but if it is followed by more and-chained clauses it should be
      when foo do
                 (bar)
               and do
                 (baz)
    
    instead. This is not possible without looking ahead, and I am not yet sure how to do this, aside from changing the way the function works to finding the indentation of every form (it only does for forms before and discards it immediately) and only returning the one we want. This is a challenge that I am not yet willing to take on.
  • The way the tests are laid out currently, the "correct" indentation for
      and if foo
            do (the-other)
    
    appears to be to take if as the starting point of the body - an inline body - however this is not consistent if we say that else if indents relative to else which would imply that if actually starts a not-inline body. This could possibly be reserved if we were to say that ands subclauses must start inline bodies but currently only the body itself can signal whether it is inline or not, not the thing starting it, so if this is indeed considered the "correct" indentation then this would take additional work to fix, though it would not be impossible. However, I suggest that we simply say and <clause> is indented relative to and.
  • In tests indent-69, indent-70, and indent-71, body comments are aligned against each other. I cannot see a reason why this would be the case; the only comments that align against each other are supposed to be single comments. The correct indentation should instead have the second body comment align against where the body would go.
    There is however a problem, in that a comment is currently treated the same as a compound form and moves along the spec, which is something that I need to fix at some point. It should not be too difficult hopefully.

These are all the things I can immediately think of, there may be and likely are many more issues and things that need to be ironed out. I did not want to end up going too out of scope in #471 so I have this as a PR, but it is actually a draft.

* test/sly-cl-indent-test.txt: Add a test for the indentation of the
else and end clauses in (nested) conditional clauses in loop.
* lib/sly-cl-indent.el (sly--lisp-indent-loop-macro-1): Indent else
and end clauses appropriately recognizing nesting.  (This relies on
the conditional line starting with the conditional keyword, so an `and
if` construct will not indent properly.)
@joaotavora
Copy link
Owner

Thanks for this work. But this is too sensitive a part of SLY to touch, IMO.

You will notice that I have changed some of the tests so that dos bodies are compound forms.

You cannot also just assume that the tests expectations are wrong just because you disagree with them. The many tests in sly-cl-indent-tests.txt are there in that way so that certain code bases indent consistently.
It doesn't matter if this way is really "better" or "more correct" with the spec (what spec?) If you change the indenter you'll provoke the wrath of users selecting a chunk of text in their buffers and pressing TAB. So unfortunately changing the tests is a non-starter.

You could however provide some kind of sly-thuna-cing-correct-loop-indentation variable that switches on "spec-correct" indentation the way you see it, but it would be off by default.

As to the implementation I can't comment. Replacing a regexp-based indenter with a sexp-based one sounds like a great idea, but the changes are too large for me to judge. I see big complicated code chunks being replaced by other big complicated code chunks. I've never been comfortable with this code, basically just inherited it from SLIME, and then Zach Shaftel did some tweaks. You may want to contact him.

Also, you should evaluate how many Emacs versions back this can work with. Emacs 25? 26?

@Thuna-Cing
Copy link
Author

You will notice that I have changed some of the tests so that dos bodies are compound forms.

You cannot also just assume that the tests expectations are wrong just because you disagree with them. The many tests in sly-cl-indent-tests.txt are there in that way so that certain code bases indent consistently. It doesn't matter if this way is really "better" or "more correct" with the spec (what spec?) If you change the indenter you'll provoke the wrath of users selecting a chunk of text in their buffers and pressing TAB. So unfortunately changing the tests is a non-starter.

This is valuable advice that I will try to keep in mind as much as possible. In this situation specifically, however, the problem is due to DO only accepting compound forms as its body (as per 6.1.5). The tests that I have changed are indented as though the atoms following DO are a part of it when that is not the case. Any code which would be misindented specifically due to these changes would signal an error upon execution (or even macroexpansion).

You could however provide some kind of sly-thuna-cing-correct-loop-indentation variable that switches on "spec-correct" indentation the way you see it, but it would be off by default.

This is an acceptable compromise, possibly, but it introduces the problem of having to either implement non-greedy or negative matching. Non-greedy, as is, is a non-starter. I can't even imagine how I would go about it. Negative is a band-aid but I think it should be fairly simple, though ugly.

As to the implementation I can't comment. Replacing a regexp-based indenter with a sexp-based one sounds like a great idea, but the changes are too large for me to judge. I see big complicated code chunks being replaced by other big complicated code chunks.

This is probably the biggest concern, and why the PR is marked as a draft. Ideally the indenting algorithm should be theoretically sound enough that all that needs to be audited is the implementation's correctness with respect to that. This is unfortunately not yet the case.

Also, you should evaluate how many Emacs versions back this can work with. Emacs 25? 26?

Off the top of my head: The limit, as is, is 29 due to the use of pos-bol, pos-eol, and some *-let macros. These can be easily replaced with some bulkier alternatives or pulled from compat-29. The next barrier is possibly scan-sexps which shows it as "27.1 or earlier" but I see commits related to it from decades back so I assume that's just a mistake. The next limit is pcases rx form, which was introduced in 26, though it can be changed with little effort to use the normal facilities as well I imagine. I don't know how backwards compatibility works in cl-lib so that might cause some additional problems. There might be other subtler problems that I am not yet aware of, however, so it's a valid question that needs to be answered.

@monnier
Copy link
Collaborator

monnier commented Jun 14, 2024 via email

@Thuna-Cing
Copy link
Author

Just shooting from the side-lines, I'm not at all familiar with that code.

Considering how little people there are that are familiar with the code, I suspect that makes most people, including me until I looked into it for this (and even then!). I welcome all help.

I'm also unfamiliar with its relationship to the cl-indent.el that ships with Emacs.

AFAICT, this code doesn't have any relationship to cl-indent.el. A quick look at common-lisp-loop-part-indentation, which is a mechanism separate from the rest of cl-indent.el, specifically for loop, reveals that it just looks at the line that is being indented and indents relative to the loop body, ignoring all surrounding context. All compound forms are offset, and everything else is flush.

You could however provide some kind of sly-thuna-cing-correct-loop-indentation variable that switches on
"spec-correct" indentation the way you see it, but it would be off by default.

This is an acceptable compromise, possibly, but it introduces the problem of having to either implement non-greedy or negative matching.

Hmm... I'd have expected all it took was to keep the old code (in addition to the new code) and add some ifs to select between the two. I guess you're thinking about mimicking the old/current indentation style using the new code?

I could check whether we are indenting a DO form and delegate the indentation to the old code. The problem happens when I am trying to indent anything else. This is a disagreement about what forms make up DOs body, so when I'm parsing I would need to pop off tokens differently, based on this variable, and if I am doing that then the code can handle the indentation without having to refer to the current code in the first place.

I know the idea of throwing away code you just wrote is probably not welcome, but I suspect indentation of loop should be able to use SMIE. I have played with using SMIE to indent ELisp code, and to some extent it kinda works, tho it would require changes in smie.el to support for "hooks" to provide specific indentation rules for specific special forms. And doing that would inevitably behave slightly differently in some cases, while at the same time being most likely slower than the current code. So I never pursued this idea in earnest. But in your case, you could use the existing indentation code for most things and use SMIE only when indenting loop (i.e. use a common-lisp-indent-function which internally uses SMIE). I have never tried such a thing, so I'm not sure how well it would work, but it might be worth a try.

I can part with the code I have written so far, though it would be a bit of a shame.

I looked into SMIE for a bit but there are a couple problems, both due to my lack of experience and to its slight shortcomings. I can't see how you represent something like do conditional-clause+ or when sexp clause in it. Maybe there is a trick that I am missing. In the TODOs I see there are (were?) some plans to allow consecutive non-terminals if one of them always starts or ends with a terminal (which clause falls under) but AFAICS that is not yet a thing, so I have to pull out the keyword from each clause and put it into conditional-clause such that I can do something like when sexp clause-keyword clause-body, but this makes the grammar extremely difficult to read. In a similar vein, I need to loop over all variations of collect, collecting, etc. make the individual lists (so (("collect" :sexp) ("collecting" :sexp) ("append" :sexp) ("appending" :sexp)...), and so on). This can (and in fact essentially must) be done via a loop, but that only makes the things less legible. Also, I couldn't find any references to SMIE in cl-indent.el. The only built-in languages that use it are those where each clause is separated explicitly, so those weren't much inspiration unfortunately. These are two of my attempts that I stopped after not being able to resolve the above issues: https://0x0.st/XctT.txt

@joaotavora
Copy link
Owner

The tests that I have changed are indented as though the atoms following DO are a part of it when that is not the case. Any code which would be misindented specifically due to these changes would signal an error upon execution (or even macroexpansion).

I don't follow this at all. Are you saying indentation matters for evaluation or macroexpansion?? Can you give an example?

The next limit is pcases rx form, which was introduced in 26,

I think 26 is a good target

AFAICT, this code doesn't have any relationship to cl-indent.el.

I think it does. It's a common ancestor to both SLY and SLIME's cl-indent.el. Not 100% sure but close to it. Check the history.

@Thuna-Cing
Copy link
Author

The tests that I have changed are indented as though the atoms following DO are a part of it when that is not the case. Any code which would be misindented specifically due to these changes would signal an error upon execution (or even macroexpansion).

I don't follow this at all. Are you saying indentation matters for evaluation or macroexpansion?? Can you give an example?

Oof, sorry, I misspoke. What I meant is that while the loop's indentation would lead the reader to assume that the non-compound forms following DO are a part of it, this is not the case, and the macroexpansion of that loop form will/should signal an error. My argument is thus that there is no benefit to keeping this misleading behaviour; no working code could possibly be affected by it - otherwise it would not be working to begin with.

@monnier
Copy link
Collaborator

monnier commented Jun 15, 2024 via email

@Thuna-Cing
Copy link
Author

I'm also unfamiliar with its relationship to the cl-indent.el that ships with Emacs. AFAICT, this code doesn't have any relationship to cl-indent.el. A quick look at common-lisp-loop-part-indentation, which is a mechanism separate from the rest of cl-indent.el, specifically for loop, reveals that it just looks at the line that is being indented and indents relative to the loop body, ignoring all surrounding context. All compound forms are offset, and everything else is flush.
So IIUC Emacs's cl-indent.el is not used at all with SLY?

That is my understanding, yes. Historical relations notwithstanding, the code I am replacing and the code in cl-indent.el are not the same.

Hmm... I'd have expected all it took was to keep the old code (in addition > to the new code) and add some ifs to select between the two. I guess > you're thinking about mimicking the old/current indentation style using > the new code? I could check whether we are indenting a DO form and delegate the indentation to the old code.
I can't see why. The ifs I'm taking about would only need to check the boolean configuration variable saying whether we should use the old code or the new code.

Ah, I see, so you are proposing that the new indenter be opt-in? That would work, yes. It could depend on a buffer-local variable like sly-indent-loop-experimental or something to that effect.

I can't see how you represent something like do conditional-clause+ or when sexp clause in it. Maybe there is a trick that I am missing.
Indeed you can't just use an existing grammar to write an SMIE grammar. I expect the grammar would deal with do and when in a more "naive" way as in: (clause ... ("when" clause) ("do" clause) ...) or probably even (clause ... (clause "when" clause) (clause "do" clause) ...) The way I have built SMIE grammars in the past was based more on adding ad-hoc elements bit by bit to make it behave in a particular way on specific tests. Using an existing grammar was usually more frustrating than anything (except for things like specifying relative precedences based on existing precedence tables).

I was hoping that that wouldn't be the case :/. My goal with this is correctness above all - that the indentation reflects the actual state of the LOOP form at (almost) all times - so something this naive and loose isn't a viable solution unfortunately.

@monnier
Copy link
Collaborator

monnier commented Jun 15, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants