Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to recover from invalid YAML #107

Merged
merged 5 commits into from
Feb 12, 2021
Merged

Conversation

DanTup
Copy link
Contributor

@DanTup DanTup commented Feb 6, 2021

This is progress towards #102. Initially it covers from the case where a key: value is expected but so far only part of the key has been typed. This would usually raise an error "Expected ':'".

throw YamlException("Expected ':'.", _scanner.emptySpan);
if (_recover) {
_tokens.insert(key.tokenNumber - _tokensParsed,
Token(TokenType.key, key.location.pointSpan() as FileSpan));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to produce a diagnostic at this point, but then continue to parse the code. In order to do that we'll need to have a way of reporting diagnostics that doesn't involve throwing an exception.

We will eventually want recovery mode to never throw an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a change that starts this, although I have some questions:

  • I added a new YamlError class though it's fairly similar to YamlException - should the exceptions just be collected instead?
  • Should all errors this way (even where we don't currently recover)? For ex. replace all throw YamlExceptions with a call to a function that emits the error and then either recovers or throws (perhaps passing a function that performs the recovery, so that in future it could be made non-nullable to ensure no locations will throw YamlExceptions when recovery is enabled).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... should the exceptions just be collected instead?

I'd probably start with that until / unless we discover a reason to change it.

Should all errors this way (even where we don't currently recover)?

Eventually the parser should recover from all errors in recovery mode. Some of the errors are probably already things that the parser could safely proceed from, so they should be easy to convert, but proceeding from some of them (without first adding recovery-specific code) would likely produce a large number of useless follow-on diagnostics.

My advice is to continue to throw everywhere that we haven't tested the recover behavior. I'd structure the recovery code like this:

if (detectedAnError) {
  reportError(YamlException(...));
  // recover from the error.
}

where reportError either throws or stores the exception depending on whether we're in recovery mode. That way the recovery code will only be executed in recovery mode and is guaranteed to not slow down the parser when parsing valid input.

The code in server that calls the parser then needs to check the stored exceptions whether one was thrown or not.

... perhaps passing a function that performs the recovery ...

I've tried writing parsers with that behavior, but it turns out that it's really hard, in general, to recover well in all cases when operating at a distance like that. I'd recommend against doing it that way and recommend that we keep the recovery code in-line. (The original version of the shared Dart parser was implemented that way, and had to be converted in order to get reasonable recovery.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the exceptions just be collected instead?

I'd probably start with that until / unless we discover a reason to change it.

Presumably changing the types of errors later would be a breaking change, though I'm not sure if there would ever be reason to. I'll switch to the exceptions for now - perhaps if the errors were extended in future, it'd be useful to have any new information also on the exception anyway.

where reportError either throws or stores the exception depending on whether we're in recovery mode.

This is what I started originally, though (if I understand correctly) means reportError can't be called if the error doesn't currently have recovery code. I wasn't sure if that was ok - though it sounds like it probably is, so I'll update it.

What I had been thinking about was:

if (detectedAnError) {
  reportError(YamlException(), () {
    // recover from the error
  });
}

The recovery code is still in-line, but in the future making the callback non-null would (hopefully) make it harder to add code reporting new errors (assuming you call reportError) without considering recovery. I'm not sure if there's a real benefit there though (given YAML is fairly static, maybe there won't be much code added for new errors in future).

Scanner(String source, {Uri? sourceUrl})
: _scanner = SpanScanner.eager(source, sourceUrl: sourceUrl);
Scanner(String source,
{Uri? sourceUrl, bool recover = false, ErrorListener? errorListener})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't currently see any value in passing in an errorListener if recovery is false, nor any value in setting recovery to true if there's no errorListener. We might consider having the value of recovery be implied by errorListener != null (that is, make the error listener the only parameter, with null implying no recovery).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any value in setting recovery to true if there's no errorListener

In the server where I'm testing this, we care about recovery but not errors (diagnostics aren't produced here, it's in response to a completion request):

https://github.com/dart-lang/sdk/blob/d55dc9c42a4362307e4deb5c97eea2db43d951e4/pkg/analysis_server/lib/src/services/completion/yaml/yaml_completion_generator.dart#L87-L95

I could create a NullErrorListener and we could pass that, although it may seem a bit strange to pass a NullErrorListener in order to trigger recovery. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be a bit odd. We might still want to add an assert that the error listener is always null when recovery it false, because that combination doesn't really make sense. If recovery is false (and there's an error in the YAML) then the single exception will always be thrown and there's no value to having a listener.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

lib/yaml.dart Outdated
///
/// If [errorListener] is supplied, its onError method will be called for each
/// error. If [recover] is false, parsing will end early so only the first error
/// may be emitted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this paragraph is currently true. There's currently only one thrown exception that's sent to the error listener. Which means that any thrown exception needs to be caught and added to the list of exceptions to report, unless it was already reported to the error listener. I think it would be cleaner if exceptions that are going to be thrown are not reported to the error listener so that clients don't need to remember to make the extra check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Member

@bwilkerson bwilkerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@bwilkerson bwilkerson merged commit b4e14c5 into dart-lang:master Feb 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants