Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recovery support to the parser #102

Open
bwilkerson opened this issue Dec 1, 2020 · 13 comments
Open

Add recovery support to the parser #102

bwilkerson opened this issue Dec 1, 2020 · 13 comments
Labels
type-enhancement A request for a change that isn't a bug

Comments

@bwilkerson
Copy link
Member

We are using the YAML parser for development-time support, including code completion, and this requires the parser to be able to recover well in the face of errors in order to provide the best UX.

I believe that naively adding recovery would be a breaking change because the parser would begin to return guesses for what it thinks the user intended to type in situations where it previously would have returned null. I believe, however, that we can make it a non-breaking change by putting recovery behind a flag that is optionally passed to the loadYaml* functions (something like bool recover = false).

Extending the API in this way would also allow us to incrementally implement recovery based on which failure cases occur most often.

Does that seem like a reasonable approach?

@DanTup
Copy link
Contributor

DanTup commented Jan 5, 2021

I had a go at trying to implement some recovery for a simple case where a key is being typed but the : has not. This is common when wanting to provide code-completion:

dependencies:
  http:
  pat // complete here 
  test:

I don't know if I'm going about it the right way, but my current work is here:

DanTup@e565364

There's one thing in there that definitely doesn't feel right, which is stashing the current key node in a field. This is used to determine whether something the scanner believes is a value is actually not, for example:

one:
two

When allowing recovery, this would parse as (key: one, value: two) but I don't believe that's valid and was trying to detect this by comparing the indentation.

DanTup@e565364

Any pointers would be appreciated! (I can open as a PR if it would be better to iterate there).

@DanTup
Copy link
Contributor

DanTup commented Jan 11, 2021

@munificent @natebosch are either of you familiar enough to provide pointers on the above?

@munificent
Copy link
Member

@munificent @natebosch are either of you familiar enough to provide pointers on the above?

I'm not sorry. I haven't looked at this package in ages. @nex3 would be the resident expert, but I don't know if she ever works on this package anymore.

I agree with @bwilkerson though that it would be good to have support for opt-in error recovery.

@nex3
Copy link
Member

nex3 commented Jan 11, 2021

I don't really work on this anymore, but I'm happy to consult. In this example:

one:
two

I'm surprised that you're going for an error recovery that inserts a virtual indent rather than one that inserts a virtual colon. It seems easier, because you don't have to try to extend the block mapping context further than it would naturally go.

@jonasfj
Copy link
Member

jonasfj commented Jan 12, 2021

Does that seem like a reasonable approach?

Seems reasonable to me..

nit: It might be preferable to call it allowPartial: true mode, or call it recover: true mode and then throw a PartiallyUnderstoodYamlException extends YamlException that includes a list of spans where something went wrong and a best-effort AST for the invalid YAML string that was parsed.
This could reduce the risk that someone accidentally use the partial result from recover-mode in production 🙈

@DanTup
Copy link
Contributor

DanTup commented Jan 13, 2021

@nex thanks!

I'm surprised that you're going for an error recovery that inserts a virtual indent rather than one that inserts a virtual colon.

That's really what I was wanted to do, though I wasn't sure how to do it. I had another look and managed to simplify it - perhaps this is what you had in mind:

DanTup@8f1d54a

Specifically: DanTup@8f1d54a#diff-c57d8918fb3309443f9d0b214d779552570b12e9f03e27f2ff99f4a97742ed99L489-R498

This covers a common case, although it falls down when the missing colon is on the first package:

        dependencies:
          zero
          one: any

This one generates the following tokens (after I skipped this exception to recover):

// TokenType.streamStart
// TokenType.blockMappingStart
// TokenType.key
// TokenType.scalar               dependencies
// TokenType.value                :
// TokenType.scalar               zero
//           one
// ...

This seems to happen because the indenting is greater than the dependencies node so it looks like a multiline value and one is eaten as part of the scalar. By the time this is detected (the exception linked above) I think it may be too late to handle. Do you have any ideas for that? There's probably still a lot of value in handling the other cases even if this can't be handled.

I can file a PR if easier to comment on there. Thanks!

@DanTup
Copy link
Contributor

DanTup commented Jan 13, 2021

@jonasfj

It might be preferable to call it allowPartial: true mode, or call it recover: true mode and then throw a PartiallyUnderstoodYamlException extends YamlException that includes a list of spans where something went wrong and a best-effort AST for the invalid YAML string that was parsed.
This could reduce the risk that someone accidentally use the partial result from recover-mode in production 🙈

I don't have a real opinion on the API so happy to change - although we specifically want to use this in "production" for the analysis server code completion so having to catch exceptions feels a bit awkward. It is already opt-in, but if it's too subtle the flag could be renamed or perhaps a separate function (for ex. parseInvalidYaml, tryParseYaml or something?).

@jonasfj
Copy link
Member

jonasfj commented Jan 13, 2021

or perhaps a separate function (for ex. parseInvalidYaml, tryParseYaml or something?).

Yeah, that might be attractive. I assume this will only be relevant for people doing code-completion and the likes.

@bwilkerson
Copy link
Member Author

I'd much rather have a separate entry point than to throw an exception for non-exceptional behavior, though I honestly think that adding a named parameter would be just as effective at protecting current users from seeing changes.

If we do have a separate entry point, let's name it something like loadYamlWithRecovery. I think that's more clear than "invalid", which the input might not be, or "try", which implies that we might fail when the point is to always produce a tree that's as close to what the user intended as possible.

@DanTup
Copy link
Contributor

DanTup commented Jan 26, 2021

I honestly think that adding a named parameter would be just as effective at protecting current users from seeing changes

Looking again, I do think an argument makes more sense too. There are already several entry points (loadYaml, loadYamlNode, loadYamlDocument) so having separate functions for each of these would inflate the number a fair bit.

which implies that we might fail when the point is to always produce a tree that's as close to what the user intended as possible.

My current changes only recover from some fairly specific cases (half-typed lines without their colons). If it should always return something, there's more work to do. Would that be preferred to adding it gradually?

@jonasfj
Copy link
Member

jonasfj commented Jan 26, 2021

There are already several entry points (loadYaml, loadYamlNode, loadYamlDocument) so having separate functions for each of these would inflate the number a fair bit.

That's a good point, I'm sold 👍

@bwilkerson
Copy link
Member Author

If it should always return something, there's more work to do. Would that be preferred to adding it gradually?

No. I'd take "always" as an aspirational goal at this point.

@DanTup
Copy link
Contributor

DanTup commented Feb 6, 2021

I've opened a PR at #107 with the changes made so far if someone is able to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

6 participants