Add recovery support to the parser #102

bwilkerson · 2020-12-01T18:08:35Z

We are using the YAML parser for development-time support, including code completion, and this requires the parser to be able to recover well in the face of errors in order to provide the best UX.

I believe that naively adding recovery would be a breaking change because the parser would begin to return guesses for what it thinks the user intended to type in situations where it previously would have returned null. I believe, however, that we can make it a non-breaking change by putting recovery behind a flag that is optionally passed to the loadYaml* functions (something like bool recover = false).

Extending the API in this way would also allow us to incrementally implement recovery based on which failure cases occur most often.

Does that seem like a reasonable approach?

The text was updated successfully, but these errors were encountered:

DanTup · 2021-01-05T17:11:35Z

I had a go at trying to implement some recovery for a simple case where a key is being typed but the : has not. This is common when wanting to provide code-completion:

dependencies:
  http:
  pat // complete here 
  test:

I don't know if I'm going about it the right way, but my current work is here:

DanTup@e565364

There's one thing in there that definitely doesn't feel right, which is stashing the current key node in a field. This is used to determine whether something the scanner believes is a value is actually not, for example:

one:
two

When allowing recovery, this would parse as (key: one, value: two) but I don't believe that's valid and was trying to detect this by comparing the indentation.

DanTup@e565364

Any pointers would be appreciated! (I can open as a PR if it would be better to iterate there).

DanTup · 2021-01-11T18:32:37Z

@munificent @natebosch are either of you familiar enough to provide pointers on the above?

munificent · 2021-01-11T20:52:37Z

@munificent @natebosch are either of you familiar enough to provide pointers on the above?

I'm not sorry. I haven't looked at this package in ages. @nex3 would be the resident expert, but I don't know if she ever works on this package anymore.

I agree with @bwilkerson though that it would be good to have support for opt-in error recovery.

nex3 · 2021-01-11T21:38:26Z

I don't really work on this anymore, but I'm happy to consult. In this example:

one:
two

I'm surprised that you're going for an error recovery that inserts a virtual indent rather than one that inserts a virtual colon. It seems easier, because you don't have to try to extend the block mapping context further than it would naturally go.

jonasfj · 2021-01-12T20:40:24Z

Does that seem like a reasonable approach?

Seems reasonable to me..

nit: It might be preferable to call it allowPartial: true mode, or call it recover: true mode and then throw a PartiallyUnderstoodYamlException extends YamlException that includes a list of spans where something went wrong and a best-effort AST for the invalid YAML string that was parsed.
This could reduce the risk that someone accidentally use the partial result from recover-mode in production 🙈

DanTup · 2021-01-13T14:46:33Z

@nex thanks!

I'm surprised that you're going for an error recovery that inserts a virtual indent rather than one that inserts a virtual colon.

That's really what I was wanted to do, though I wasn't sure how to do it. I had another look and managed to simplify it - perhaps this is what you had in mind:

DanTup@8f1d54a

Specifically: DanTup@8f1d54a#diff-c57d8918fb3309443f9d0b214d779552570b12e9f03e27f2ff99f4a97742ed99L489-R498

This covers a common case, although it falls down when the missing colon is on the first package:

        dependencies:
          zero
          one: any

This one generates the following tokens (after I skipped this exception to recover):

// TokenType.streamStart
// TokenType.blockMappingStart
// TokenType.key
// TokenType.scalar               dependencies
// TokenType.value                :
// TokenType.scalar               zero
//           one
// ...

This seems to happen because the indenting is greater than the dependencies node so it looks like a multiline value and one is eaten as part of the scalar. By the time this is detected (the exception linked above) I think it may be too late to handle. Do you have any ideas for that? There's probably still a lot of value in handling the other cases even if this can't be handled.

I can file a PR if easier to comment on there. Thanks!

DanTup · 2021-01-13T14:50:27Z

@jonasfj

It might be preferable to call it allowPartial: true mode, or call it recover: true mode and then throw a PartiallyUnderstoodYamlException extends YamlException that includes a list of spans where something went wrong and a best-effort AST for the invalid YAML string that was parsed.
This could reduce the risk that someone accidentally use the partial result from recover-mode in production 🙈

I don't have a real opinion on the API so happy to change - although we specifically want to use this in "production" for the analysis server code completion so having to catch exceptions feels a bit awkward. It is already opt-in, but if it's too subtle the flag could be renamed or perhaps a separate function (for ex. parseInvalidYaml, tryParseYaml or something?).

jonasfj · 2021-01-13T15:13:51Z

or perhaps a separate function (for ex. parseInvalidYaml, tryParseYaml or something?).

Yeah, that might be attractive. I assume this will only be relevant for people doing code-completion and the likes.

bwilkerson · 2021-01-13T16:08:12Z

I'd much rather have a separate entry point than to throw an exception for non-exceptional behavior, though I honestly think that adding a named parameter would be just as effective at protecting current users from seeing changes.

If we do have a separate entry point, let's name it something like loadYamlWithRecovery. I think that's more clear than "invalid", which the input might not be, or "try", which implies that we might fail when the point is to always produce a tree that's as close to what the user intended as possible.

DanTup · 2021-01-26T11:08:26Z

I honestly think that adding a named parameter would be just as effective at protecting current users from seeing changes

Looking again, I do think an argument makes more sense too. There are already several entry points (loadYaml, loadYamlNode, loadYamlDocument) so having separate functions for each of these would inflate the number a fair bit.

which implies that we might fail when the point is to always produce a tree that's as close to what the user intended as possible.

My current changes only recover from some fairly specific cases (half-typed lines without their colons). If it should always return something, there's more work to do. Would that be preferred to adding it gradually?

jonasfj · 2021-01-26T11:23:28Z

There are already several entry points (loadYaml, loadYamlNode, loadYamlDocument) so having separate functions for each of these would inflate the number a fair bit.

That's a good point, I'm sold 👍

bwilkerson · 2021-01-26T15:50:58Z

If it should always return something, there's more work to do. Would that be preferred to adding it gradually?

No. I'd take "always" as an aspirational goal at this point.

DanTup · 2021-02-06T11:34:52Z

I've opened a PR at #107 with the changes made so far if someone is able to review.

DanTup mentioned this issue Feb 6, 2021

Attempt to recover from invalid YAML #107

Merged

sigurdm added the type-enhancement A request for a change that isn't a bug label Sep 30, 2021

bwilkerson mentioned this issue Dec 19, 2023

Completion in pubspec.yaml is not completing top-level properties dart-lang/sdk#54264

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add recovery support to the parser #102

Add recovery support to the parser #102

bwilkerson commented Dec 1, 2020

DanTup commented Jan 5, 2021

DanTup commented Jan 11, 2021

munificent commented Jan 11, 2021

nex3 commented Jan 11, 2021

jonasfj commented Jan 12, 2021

DanTup commented Jan 13, 2021

DanTup commented Jan 13, 2021

jonasfj commented Jan 13, 2021

bwilkerson commented Jan 13, 2021

DanTup commented Jan 26, 2021

jonasfj commented Jan 26, 2021

bwilkerson commented Jan 26, 2021

DanTup commented Feb 6, 2021

Add recovery support to the parser #102

Add recovery support to the parser #102

Comments

bwilkerson commented Dec 1, 2020

DanTup commented Jan 5, 2021

DanTup commented Jan 11, 2021

munificent commented Jan 11, 2021

nex3 commented Jan 11, 2021

jonasfj commented Jan 12, 2021

DanTup commented Jan 13, 2021

DanTup commented Jan 13, 2021

jonasfj commented Jan 13, 2021

bwilkerson commented Jan 13, 2021

DanTup commented Jan 26, 2021

jonasfj commented Jan 26, 2021

bwilkerson commented Jan 26, 2021

DanTup commented Feb 6, 2021