[DONE] How do we refactor our errors? #636

olivereanderson · 2022-01-28T15:24:53Z

olivereanderson
Jan 28, 2022

Introduction

In discussion #460 we discussed some of the pitfalls of our current error types and some ideas about how we can improve the situation.
This new discussion is about settling on a concrete strategy that we follow.

Here are some ideas:

Guidelines strategy 1:

Follow the same guidelines as influxdb_iox.

The aforementioned guidelines mention the pros of this approach so I will only list some cons (for identity here).
Cons:

Adds snafu (which is not as widely used as thiserror and anyhow/eyre) to everyone's dependencies if we go for this approach. Furthermore snafu is not yet stable (current version is 0.7) and thus subject to change.
Sets some details in stone. If it turns out one of the variants in a (local) error enum is not needed it is a breaking change to remove it.
A bit less ergonomic than having fewer errors.

Open questions:

How do we avoid introducing irrelevant information when these errors cross module and/or crate boundaries?
Is there any way in which we can make these new guidelines improve the errors thrown in the javascript bindings?
Does the error handling precision offered by this approach outweigh the complexity it introduces?

Guidelines strategy 2:

Similar to 1, but not necessarily using snafu and not necessarily as strict.

Prefer to use one error enum per module (or at least module folder in our libraries structure)
Introduce a global opaque error report type that we return when we know that the error exists to be logged and not to (significantly) alter control flow. The Rust error handling project group is working on introducing such a type to the standard library, but as far as I am aware this is not stable yet. For the time being we can either just use eyre:Report/anyhow::Error or introduce our own.
Prefer wrapping the Report type in errors over String/&'static str in order to add more context at runtime (and optionally also a backtrace).

Pros:

Gives library consumers (relatively) precise information regarding how a given function may fail compared to the errors we are currently returning.
Gives library consumers the ability to obtain better logs than what is currently possible.
Possibly easier to be vague when this is desired compared to strategy 1.

Cons:

Might be less ergonomic if we don't use snafu than strategy 1.
Might be hard to decide when to be vague vs when to be precise in practice.
More vague than strategy 1 hence not as easy to encourage.

Open questions:
These are the same as in strategy 1.

Guidelines strategy 3:

Only use an abstract error type IdentityError that has a method kind that returns an enum of possible error categories (similar to https://doc.rust-lang.org/std/io/struct.Error.html#method.kind). The ErrorKind enum should only have variants that broadly describe the kind of failure that occurred like IO, ConversionFailure, MissingEntity, InvalidProof, InvalidData, DuplicationAttempt etc.

Pros:

Less complex than all other alternatives (including what we currently have).
Having broad definitions of the "kind" of errors in this library makes it easier for users to understand what they might care about when handling an error compared to the current setup where there are a few enums with many variants most of which are designed for very specific situations.
Might offer an improvement in the Wasm bindings as it would allow the WasmError to also provide the error category.
Still provides some context that may be enough for callers to decide on how to alter their control flow in many situations.
Can possibly provide more context if necessary by implementing more methods that return Option<ContextDependentInformation> in a non breaking manner.

Cons:

Providing additional error information in terms of data structures (such as for instance a Timestamp, a DIDUrl etc.) that may be used by callers when reacting to an error at runtime will be at best awkward.
The possibilities for extension discussed under Pros may not scale very well.
An error returned from any given function will (still) most likely only cover a fraction of the total number of possible error kinds, but this does not get reflected in the function signature.

Guidelines strategy 4:

Try to improve the error types we already have. This would typically entail removing variants that represent errors from other crates. For instance these variants in identity_account::Error:

  /// Caused by errors from the [identity_core] crate.
  #[error(transparent)]
  CoreError(#[from] identity_core::Error),
  /// Caused by errors from the [identity_did] crate.
  #[error(transparent)]
  DIDError(#[from] identity_did::Error),
  /// Caused by errors from the [identity_credential] crate.
  #[error(transparent)]
  CredentialError(#[from] identity_credential::Error),
  /// Caused by errors from the [identity_iota] crate.
  #[error(transparent)]
  IotaError(#[from] identity_iota::Error),

should all be removed and then the internal code that receives these errors maps them to something that is a better fit in this context.
Furthermore we can no longer expose errors from non-stable crates in these enums so they need to be wrapped in opaque types.

Pros:

More familiar than the other approaches (at least to those that are already familiar with this library).
Can provide better additional information for Rust users to work with (at runtime) compared to strategy 3.
If this strategy "works" it might be simpler than strategies 1 & 2 while still providing similar benefits.

Cons:

The one error per crate style does not necessarily scale very well.
It might not be possible to get the number of variants in these enums to a reasonable number without giving up information in which case we might as well go with strategy 3.
To make this work we will probably have to give up some ergonomics that we are currently enjoying in which case we might as well go with either strategy 1 or 2 (or 3 if the ergonomics of propagating errors at very low cost is that important).

Any thoughts on the proposed ideas or any other suggestions would be greatly appreciated.

Edit: Additional strategies after collecting feedback.

Guidelines strategy 5.

Following @cycraig's suggestion copied here:

Reduce the number of top-level errors by grouping related variants. This may also be achieved by defining sub-error enums for specific cases (or modules) like with DIDError. (Edit: for future reference, the decision of when to define sub-error enums or inline the error message is likely based on how often that message is repeated and how many distinct variants there are. One instance => inline the message, multiple tends towards sub-error enums).

E.g.

  #[error("Invalid Document - Missing Message Id")]
  InvalidDocumentMessageId,
  #[error("Invalid Document - Signing Verification Method Type Not Supported")]
  InvalidDocumentSigningMethodType,

can become:

  #[error("invalid DID Document - {0}")]
  InvalidDocument(&'static str), // or String if dynamic values are required.

return Err(Error::InvalidDocument("missing message ID"));
return Err(Error::InvalidDocument("signing verification method type not supported"));

Improve error messages to include more information where appropriate, rather than generic, obscure messages.

See: https://www.morling.dev/blog/whats-in-a-good-error-message/

E.g.

  #[error("Invalid Root Document")]
  InvalidRootDocument,

needs more context:

  #[error("invalid root document - {0}")]
  InvalidRootDocument(&'static str),

    // The previous message id must be null.
    if !document.metadata.previous_message_id.is_null() {
      return Err(Error::InvalidRootDocument("not first in chain due to non-null previous_message_id"));
    }
	
    // Validate the hash of the public key matches the DID tag.
    let signature: &Signature = document.try_signature()?;
    let method: &IotaVerificationMethod = document.try_resolve_method(signature)?;
    let public: PublicKey = method.key_data().try_decode()?.into();
	if document.id().tag() != IotaDID::encode_key(public.as_ref()) {
      return Err(Error::InvalidRootDocument("signature verification method does not match DID tag"));
    }

Change errors to include a source where appropriate, to enable error chaining/context for application error handlers. This is supported by the discussions in the following Rust Error Handling Project issues:

Specifically this comment which defines the project's official guidance as:

Return source errors via Error::source unless the source error's message is included in your own error message in Display.

Replace blanket #[from] implementations of external errors with errors for specific cases as mentioned in Strategy 4. NOTE: this may actually increase the number of variants, so it does not have to be a hard rule.
Mark exposed error enums as #[non_exhaustive] to prevent technical breaking changes from introducing new variants, similar to DIDError.
Consider using anyhow/eyre to construct a report including context for error messages in the Wasm bindings. (Edit: to be clear this is because while Rust can report the context by using anyhow/eyre, Wasm needs an alternative.)

Suggestions 1-5 can be done incrementally and do not require a major all-at-once refactor, so we can easily explore whether or not they work in a single crate/module independently or combined. Suggestion 6 is confined to the Wasm bindings where we have more leeway.

Guidelines strategy 6:

Using something similar to strategy 3 for the javascript bindings and another strategy for Rust. This needs to be split into several sub-strategies we can call these W3-Rk where k is a number between 1 and 5 which then references the correspondingly numbered strategy above.

Note that this does not necessarily exclude also including more context in message as suggested in this quote from @cycraig :

Consider using anyhow/eyre to construct a report including context for error messages in the Wasm bindings. (Edit: to be clear this is because while Rust can report the context by using anyhow/eyre, Wasm needs an alternative.)

Pros:

Unless we can come up with a better way to document what name might be in the JS bindings then knowing that it can be one of say 10 things is an improvement for many developers using the JS bindings (see @abdulmth's comment below).
More specific details about the error can still be found in message. We could even consider making the first line of message be full name: <current error.name>.

Cons:
I will directly quote @cycraig here:

I also foresee long arguments about what the kind groups should be. With how large and diverse the scope of the codebase is currently, I do not imagine this would be productive nor massively improve how useful the errors are to end-developers when compared to just improving error messages and adding context.

olivereanderson · 2022-01-31T08:04:07Z

olivereanderson
Jan 31, 2022
Author

In #460 we also discussed introducing a FatalError this is of course an additional type we can introduce regardless of which approach we decide on.

1 reply

cycraig Feb 1, 2022

I'm still opposed to FatalError: if an error is truly fatal/unrecoverable (like out-of-memory) we would panic, and we should never panic if we can help it. Returning a proper error variant with context and a message would be better in my opinion.

olivereanderson · 2022-01-31T09:04:56Z

olivereanderson
Jan 31, 2022
Author

Here are a couple of libraries that use snafu and follow the snafu philosophy:

https://github.com/influxdata/influxdb_iox
https://github.com/Enet4/dicom-rs (with an article on how they refactored to snafu)
https://github.com/sp1ff/mpdpopm (the author also wrote a nice article Rust Error Handling which is worth reading).

kube-rs follow the snafu philosophy, but utilises thiserror instead of snafu: See this PR: kube-rs/kube#686. It is worth noting that they also wanted to migrate away from working with a single error enum.

I also find the pattern of using a combination of thiserror together with anyhow as seen in the following lesson from Zero To Production in Rust interesting.

0 replies

olivereanderson · 2022-01-31T09:30:53Z

olivereanderson
Jan 31, 2022
Author

An error type that fits the description in strategy 3 is built in this blog post by Jane Lusby from the Rust error handling working group.

0 replies

olivereanderson · 2022-01-31T10:14:09Z

olivereanderson
Jan 31, 2022
Author

With regards to the question: How do we avoid introducing irrelevant information when these errors cross module and/or crate boundaries?
from strategy 1 & 2 let me explain a bit more what I meant:

So suppose we have a module foo with an error FooError with various variants, like FooError::X, FooError::Y etc., then we call foo::foo_function in our bar module. Then the simplest thing to do is to add a variant for FooError in BarError, but now we have reintroduced our original problem, just more locally. I have essentially only propagated the error from foo without giving any idea of which variants from FooError are relevant in the bar module.

0 replies

PhilippGackstatter · 2022-01-31T14:07:28Z

PhilippGackstatter
Jan 31, 2022

Generally, 1 and 2 seem good to me. I wouldn't be as strict to prescribe only using ensure! and .fail() when using snafu, but that's a smaller issue.

One of the most important parts to me is how wrapping errors works. In the account you can come across a wide variety of errors, from client, to storage, to DID document problems ("verification method not found", ...). These will currently be our highest-level errors exposed to developers and so the guideline should work well for those.

From what I understand snafu should make that wrapping process relatively simple. Snafu seems to work best when errors are wrapped all the way to the source, in order to get a "semantic backtrace", as it was described in one of the blog posts. That seems very helpful for debugging and logging, at the cost of making the errors rather large.

The consequence seems to be that if you have a relatively fragmented error landscape on the lower-levels, then the higher-level would have as many variants as there are fragmented errors. It is a downside in so far, as the developer of the higher-level needs to write a lot of wrapping code. The upside is that it is very explicit and no information is lost. Basically, what you meant with "How do we avoid introducing irrelevant information when these errors cross module and/or crate boundaries?".

Does the error handling precision offered by this approach outweigh the complexity it introduces?

I think we fundamentally want our errors to be matchable and informative without throwing errors that cannot occur, and this approach achieves both, so overall I think it's worth giving it a shot.

Is there any way in which we can make these new guidelines improve the errors thrown in the javascript bindings?

We can write a custom report function that produces a nice representation for JS. There's an example towards the end of that blog you linked.
We might still want to improve the error.name for JS, which is not so easy I believe. In my mind, it would be nice if it would return something like OutermostError.InnerError.InnermostError so it would be possible to match on it. It would be even better if we could wrap proper JS Error objects in each other based on those names, but I don't think that's possible? Getting those error type names though is not that easy and would probably require writing a macro that replaces IntoStaticStr and extracts the type name with std::any::type_name which is quite a hack.

For guideline 3 I don't see how this is better than what we have now.

Guideline 4 is also interesting, since we map the lower-level errors to instances of the higher-level module. Still, we have way more errors to handle than a function can possibly return, and this seemed like one of the main goals of the refactor to me.

TL;DR: The snafu solution seems like the best trade-off, even if it's not perfect.

1 reply

olivereanderson Jan 31, 2022
Author

The consequence seems to be that if you have a relatively fragmented error landscape on the lower-levels, then the higher-level would have as many variants as there are fragmented errors. It is a downside in so far, as the developer of the higher-level needs to write a lot of wrapping code. The upside is that it is very explicit and no information is lost.

From what I can tell influxdb_iox and dicom-rs don't roll out all the variants from the lower-level error enums in the higher level functions/code, but instead wrap the lower level errors in a few variants that are named according to the context in which they can occur in the higher level module/code. I think that if we decide to go with strategy 1 or 2. Then we should also adopt this pattern. This means that if one wants to extract all the information from the source, then one will have to do some nested matches, but by having descriptive names for each variant in every error enum one will know what one should be looking for while descending the levels.

Getting those error type names though is not that easy and would probably require writing a macro that replaces IntoStaticStr and extracts the type name with std::any::type_name which is quite a hack.

This is an interesting idea worth considering. I also agree that since Strategy 1, 2 and also 3 would all produce better error reports the JS bindings will also benefit from this.

For guideline 3 I don't see how this is better than what we have now.

It is not so much better in terms of providing relatively precise guarantees with regards to how a given function in the library might fail, but it is an improvement in terms of simplicity. If the vast majority of callers are only interested in either propagating the error upstream and/or logging it, then a simpler error type is better. On the other hand we might not be able to exclude the possibilities for certain libraries/applications to want to handle our errors in much more detail and then Strategy 1 or 2 is probably the way to go.

coodos · 2022-02-01T13:59:01Z

coodos
Feb 1, 2022
Collaborator

I feel that 3 is the best approach for most people to rely on, it provides a quick hint on what went wrong and then the user can see from there.

In my experience in working on Identity Suite I ran into a weird issue where I had accidentally deleted the line which signed and published my updated DID with a few methods added to it and I remember that the error it gave was so vague that I could not find what was going wrong for quite a long time.

If we have concise output that makes it extremely simple for most consumers of the library to deduce what went wrong then I think there is nothing better than that approach.

Would love to hear your thoughts on this :)

2 replies

olivereanderson Feb 1, 2022
Author

Thanks for your input @coodos!
I think many programmers like having failures sharing common characteristics grouped together in classes or categories as they are often handled in a similar manner. This is something we definitely should take into account when developing this library.

At the same time we also want to support applications that could recover from a failure if they had enough information at runtime, but this should ideally not push esoteric error handling on every other user.

If there turns out to be a clear split between the preferences of developers using the Javascript bindings and developers using the Rust library directly we might consider going for different approaches for each language we support. What do you think of this idea?

coodos Feb 1, 2022
Collaborator

I have to agree with you there, there are ways in which Rust and JS developers and development practices are fundamentally different, thus it is natural that they will have different approaches to the errors that are raised as well.

I think that having different styles of error handling for different languages is a pretty nice idea. If this does not add considerable burden for you then honestly I think you should just go for it!

cycraig · 2022-02-01T16:22:02Z

cycraig
Feb 1, 2022

It's rather difficult to analyse any of the given strategies without concrete examples of what the code would look like and how that would affect ergonomics and development "pain", both internally and for consumers of the library.

There's quite a lot of background and context missing from previous discussions so I'll only comment on the approaches presented.

Strategy 1

Follow the same guidelines as influxdb_iox.

If you take a look at that guide, it's from over two years ago, which is around the same time thiserror only just released 1.0, before being widely adopted. In fact, their project is also using thiserror in some places now: https://github.com/influxdata/influxdb_iox/search?q=thiserror . My assumption then, is that their snafu errors are either sufficient or it's just legacy code that isn't worth migrating at the moment.

As @PhilippGackstatter mentioned, several of the guidelines such as using ensure! and .fail() are snafu idiosyncracies, while others we already follow, such as a custom Result type.

I don't see any significant benefits to snafu as things like defining a source field to retain a context/error chain for anyhow/eyre can also be achieved with thiserror.

Example from https://nick.groenen.me/posts/rust-error-handling/

use thiserror::Error;

/// WordCountError enumerates all possible errors returned by this library.
#[derive(Error, Debug)]
pub enum WordCountError {
    /// Represents an empty source. For example, an empty text file being given
    /// as input to `count_words()`.
    #[error("Source contains no data")]
    EmptySource,

    /// Represents a failure to read from input.
    #[error("Read error")]
    ReadError { source: std::io::Error },

    /// Represents all other cases of `std::io::Error`.
    #[error(transparent)]
    IOError(#[from] std::io::Error),
}

Furthermore snafu is not yet stable (current version is 0.7) and thus subject to change.

This is a problem if we ever aim to provide a stable API, which includes not exposing errors from pre-1.0 crates, so I don't think snafu should get a special exemption here if its derived structure/behaviour could change: https://rust-lang.github.io/api-guidelines/necessities.html#public-dependencies-of-a-stable-crate-are-stable-c-stable

This alone would be reason enough not to switch from thiserror to snafu.

Strategy 2

Prefer to use one error enum per module (or at least module folder in our libraries structure)

The advantage of module-level errors is specificity in that the possible errors are more defined, but has the disadvantage of being more unwieldy to developers using our library, since they have to handle multiple error types (unless they use anyhow). This was raised in a somewhat-related Stronghold error PR: iotaledger/stronghold.rs#269 (review)

I.e. module-level errors may hinder library developers that use identity.rs due to multiple exposed types but it probably won't affect application developers using opaque errors much.

That said, I have already used a module-level error where it made sense (and due to required trait bounds) with DIDError.

#[derive(Debug, thiserror::Error, strum::IntoStaticStr)]
#[non_exhaustive]
pub enum DIDError {
  #[error("Invalid Authority")]
  InvalidAuthority,
  #[error("Invalid Fragment")]
  InvalidFragment,
  #[error("Invalid Method Id")]
  InvalidMethodId,
  #[error("Invalid Method Name")]
  InvalidMethodName,
  #[error("Invalid Path")]
  InvalidPath,
  #[error("Invalid Query")]
  InvalidQuery,
  #[error("Invalid Scheme")]
  InvalidScheme,

  #[error("{0}")]
  Other(&'static str),
}

Introduce a global opaque error report type [...]

I'm pretty sure the ecosystem is aligned on this: as a library, we definitely should not use opaque error crates like anyhow or eyre. We should not introduce our own Report type either when there exist ways of cooperating with existing reports from anyhow/eyre in application code, by using source fields for instance.

Strategy 3

Only use an abstract error type IdentityError that has a method kind that returns an enum of possible error categories [...]

While I have seen some crates like clap do this, it's usually only from a single module. I think it's infeasible to do this across the eight crates we have now without ending up in the same position, or worse it ends up strongly-coupling all of them and removes the distinction between what errors can be thrown from each crate.

I also foresee long arguments about what the kind groups should be. With how large and diverse the scope of the codebase is currently, I do not imagine this would be productive nor massively improve how useful the errors are to end-developers when compared to just improving error messages and adding context.

Can possibly provide more context if necessary by implementing more methods that return Option in a non breaking manner.

I don't think implementing custom methods to return context aligns with the current error handling ecosystem when there are existing approaches like source for anyhow and eyre.

Providing additional error information in terms of data structures (such as for instance a Timestamp, a DIDUrl etc.) that may be used by callers when reacting to an error at runtime will be at best awkward.

Typically we just format them in a string if we want to add that information to the error message.

Strategy 4

I am most aligned with improving our current errors.

The one error per crate style does not necessarily scale very well.

I'm not convinced the one-error-per-module approach is a significant improvement either though (see my Strategy 2 comment above). We would go from ~6 error types to more than two dozen quite easily.

So what should we do?

In my opinion, we should:

Reduce the number of top-level errors by grouping related variants. This may also be achieved by defining sub-error enums for specific cases (or modules) like with DIDError. (Edit: for future reference, the decision of when to define sub-error enums or inline the error message is likely based on how often that message is repeated and how many distinct variants there are. One instance => inline the message, multiple tends towards sub-error enums).

E.g.

  #[error("Invalid Document - Missing Message Id")]
  InvalidDocumentMessageId,
  #[error("Invalid Document - Signing Verification Method Type Not Supported")]
  InvalidDocumentSigningMethodType,

can become:

  #[error("invalid DID Document - {0}")]
  InvalidDocument(&'static str), // or String if dynamic values are required.

return Err(Error::InvalidDocument("missing message ID"));
return Err(Error::InvalidDocument("signing verification method type not supported"));

Improve error messages to include more information where appropriate, rather than generic, obscure messages.

See: https://www.morling.dev/blog/whats-in-a-good-error-message/

E.g.

  #[error("Invalid Root Document")]
  InvalidRootDocument,

needs more context:

  #[error("invalid root document - {0}")]
  InvalidRootDocument(&'static str),

    // The previous message id must be null.
    if !document.metadata.previous_message_id.is_null() {
      return Err(Error::InvalidRootDocument("not first in chain due to non-null previous_message_id"));
    }
	
    // Validate the hash of the public key matches the DID tag.
    let signature: &Signature = document.try_signature()?;
    let method: &IotaVerificationMethod = document.try_resolve_method(signature)?;
    let public: PublicKey = method.key_data().try_decode()?.into();
	if document.id().tag() != IotaDID::encode_key(public.as_ref()) {
      return Err(Error::InvalidRootDocument("signature verification method does not match DID tag"));
    }

Change errors to include a source where appropriate, to enable error chaining/context for application error handlers. This is supported by the discussions in the following Rust Error Handling Project issues:

Specifically this comment which defines the project's official guidance as:

Return source errors via Error::source unless the source error's message is included in your own error message in Display.

Replace blanket #[from] implementations of external errors with errors for specific cases as mentioned in Strategy 4. NOTE: this may actually increase the number of variants, so it does not have to be a hard rule.
Mark exposed error enums as #[non_exhaustive] to prevent technical breaking changes from introducing new variants, similar to DIDError.
Consider using anyhow/eyre to construct a report including context for error messages in the Wasm bindings. (Edit: to be clear this is because while Rust can report the context by using anyhow/eyre, Wasm needs an alternative.)

Suggestions 1-5 can be done incrementally and do not require a major all-at-once refactor, so we can easily explore whether or not they work in a single crate/module independently or combined. Suggestion 6 is confined to the Wasm bindings where we have more leeway.

With regards to multiple module-level errors vs crate-level errors, I'm still undecided. See comment under Strategy 2 above. I would not support a massive refactor to module-level errors but if a proof-of-concept proves successful I would be happy to be proven wrong and go with an incremental refactor.

7 replies

HenriqueNogara Feb 1, 2022

I agree with Craig here. I think that if we decide to go for one-per-module errors, we will end up having dozens of enums and that might not be ideal for our API, even though I also see the benefits considering what we currently have. I would also be hesitant to use snafu considering it's not stable. Grouping together errors, providing source where appropriate and more informative messages seems to be the best option to me.

olivereanderson Feb 1, 2022
Author

EDIT: @HenriqueNogara commented just before I finished writing this. This is a reply to @cycraig.

I hope we can agree now that we should keep error types for the identity library and not replace them with opaque errors.

To be clear I never said we should remove all (or even most) of our error types in favour of replacing them with opaque errors (unless one would interpret Strategy 3 as doing exactly that). What I meant is that there are certain places where I think we just as well could have used Box<dyn Error> or anyhow. One example is the invalid timestamp variant here: https://github.com/iotaledger/identity.rs/blob/dev/identity-core/src/error.rs#L50. This is an example of exposing an error from an unstable crate which we can fix by either wrapping it inside our own tuple struct, or placing it in an opaque error. Another alternative would be to spread this error out into several more variants in our top-level enum. It all depends on what helps the library consumer the most. And that is exactly the point (as you also mentioned) of the blog post I linked above. It doesn't have to be thiserror or anyhow one can use both.

Sure. Downstream library developers that expose error enums (not opaque errors like anyhow or Box) and use identity.rs need to map our error variants to theirs, e.g. #[from] identity::account::Error.

Thanks for the example. I am still not convinced libraries should (re)-expose errors from their dependencies. It might be better to handle them internally. By that I mean create one's own error types that are responsible for carrying the relevant information from the dependency. There are of course exceptions, perhaps especially if the dependency is very familiar in the ecosystem.

but I think it should be balanced against the reality of the time and effort it would take to achieve module-level errors versus the perceived benefit to developers using the identity library. I maintain that, in my experience at least, developers typically only handle specific errors after they experience them, and not all possible errors that could happen - which is why having a large error enum (while not ideal) is not completely horrible

I totally agree with everything you wrote here. The question is what is most inline with this reality of Strategy 3 and 4? I like Strategy 3, because it is in some sense higher level than all the others, and it also has the lowest cost to the happy path for fallible functions that return Result<()>.

I simply (perhaps strongly) consider the other points I raised such as improving error messages and adding context far more immediately beneficial.

I am not completely opposed to trying this out either and it would be really nice if this would work out decently, but again I am not sure we can commit to crate level errors as more functionality gets added to the crates and it is better doing a substantial refactor now than an enormous overhaul after adding even more functionality.

Please be aware that I really appreciate your input and concerns and I will seriously consider everything you have advocated for. Error handling is just really hard to get right and I feel that whatever we do it will be trying to figure out what is the least bad option.

I started out thinking that Strategy 1 or maybe 2 would be the best way to go, but after reading the response from @coodos and your last comment I am now most in favour of Strategy 3 (provided we can come up with some reasonable error categories).

cycraig Feb 2, 2022

What I meant is that there are certain places where I think we just as well could have used Box or anyhow. One example is the invalid timestamp variant here: https://github.com/iotaledger/identity.rs/blob/dev/identity-core/src/error.rs#L50. This is an example of exposing an error from an unstable crate which we can fix by either wrapping it inside our own tuple struct, or placing it in an opaque error.

You gave two quotes on the argument between using thiserror for libraries and anyhow for applications. Not leaking pre-1.0 crates in our API is a completely different discussion and for that I agree we need to erase the type, either by Box<dyn Error> or just converting it directly to a string message.

I am still not convinced libraries should (re)-expose errors from their dependencies. It might be better to handle them internally.

We do not get to dictate the error handling strategies of other libraries, nor should we pretend that the identity library errors deserve special treatment by developers using it above all others. We re-export errors, many other libraries re-export errors, thiserror makes re-exporting errors easy.

The question is what is most inline with this reality of Strategy 3 and 4? I like Strategy 3, because it is in some sense higher level than all the others, and it also has the lowest cost to the happy path for fallible functions that return Result<()>.

I have already given my opinion on Strategy 3, see above.

I am not completely opposed to trying this out either and it would be really nice if this would work out decently, but again I am not sure we can commit to crate level errors as more functionality gets added to the crates and it is better doing a substantial refactor now than an enormous overhaul after adding even more functionality.

New functionality should test out the new error handling guidelines.

PhilippGackstatter Feb 2, 2022

developers typically only handle specific errors after they experience them, and not all possible errors that could happen - which is why having a large error enum (while not ideal) is not completely horrible

I think that's an excellent point. If it were differently, JS and Python wouldn't be as successful as they are, given how extremely hidden errors are in those languages. Web APIs also tend to have very generic variants, such as InvalidAction, InvalidParameterCombination, ValidationError. So it's generally not unheard of to be overly broad with errors. Consider the resources AWS has to engineer these APIs, yet still they apparently came to the conclusion that it's fine to be imprecise. (To be fair, they specify per-function errors, but those are in addition to those common errors).

If we take that as the assumption, then it's okay for a function to return more errors according to its signature than it practically can. As long as matchability is ensured, which it is with a single enum per crate, then a developer can experience the error and handle that specific variant.

I definitely also acknowledge the problems with exposing fragmented errors to our users, while single error enums are much easier to integrate. And I like the improvements @cycraig suggested for guideline 4.

I agree with the guideline of not re-exposing non-stable errors from dependencies, but mapping them to an internal variant instead and include either a string representation of the error or wrap it in an opaque error.

An open question to me is still, "wrap it or map it?" Or, the issue mentioned in the original post under guideline 4, that is, how do we convert, say, CoreError into identity_account::error::Error. And can we avoid multiple ways of reporting the same error with this approach? For instance Error::IotaError(identity_iota::Error::InvalidDoc(identity_did::Error::InvalidMethodEmbedded)) and Error::DIDError(identity_did::Error::InvalidMethodEmbedded) in the account is a real example I've confused myself with in the past.

Mapping would give users a better glance at what broad error categories can occur (with the known caveat of "overcatching"), without having to dive into every single crate that we wrap. Mapping would allow us to better enable what @abdulmth would like for Wasm, e.g. an overview of what errors can occur. If we map, we still need to keep the original error around, otherwise we lose too much information or end up with a very large amount of variants. This sounds like guideline 3, with the addition of keeping the original error in a Box<dyn Error>, or more realistically, in an anyhow::Error (so we don't have to reimplement downcasting). So basically:

pub struct AccountError {
    source: anyhow::Error,
    kind: AccountErrorKind
}

That enables someone to still access the original error:

if let Some(error) = account_error.source() {
    if let Some(error) = error.downcast_ref::<CoreError> {
        match error { ... }
    }
}

Even with this approach, there's still multiple ways we could receive an InvalidMethodEmbedded error from lower-level crates. So it doesn't really address that issue, and except for being more performant and better for Wasm, it's not a huge improvement over just wrapping things the way we do now. Any thoughts on this from anyone?

Edit: Just realized that Error::source of course only returns &dyn Error, not the anyhow::Error, so downcasting like that is not possible. It would be possible if we implemented our own source_ method.

olivereanderson Feb 2, 2022
Author

I think that's an excellent point. If it were differently, JS and Python wouldn't be as successful as they are, given how extremely hidden errors are in those languages. Web APIs also tend to have very generic variants, such as InvalidAction, InvalidParameterCombination, ValidationError. So it's generally not unheard of to be overly broad with errors.

This is basically the motivation behind Strategy 3. Of all the suggestions above it is the most similar to the more or less established error handling pattern in many popular programming languages. I am curious to see whether something like this becomes (relatively) common within the Rust ecosystem as time passes.

So it doesn't really address that issue, and except for being more performant and better for Wasm, it's not a huge improvement over just wrapping things the way we do now. Any thoughts on this from anyone?

I still think something like this (basically an improved Strategy 3) could be viable, but at the same time it also requires a major refactor and as you and others have pointed out it is not clear whether its advantages will outweigh the disadvantages.

abdulmth · 2022-02-01T22:48:52Z

abdulmth
Feb 1, 2022

Regarding to the WASM bindings, to my understanding the bindings will somehow consume the errors and convert them to JavaScript errors. This is done now by throwing an Error that has a name and a message properties.

For example the following code:

    try {
        await account.createService({
            fragment: "my-service-1",
            type: "MyCustomService",
            endpoint: "invalid-url"
        })
    }catch(e){
        if(e instanceof Error){
            console.log(e.name)
            console.log(e.message)
        }
    }

will print:

InvalidUrl
Invalid Url: relative URL without a base

Although the error message is helpful here, the error is still quite unexpected for a JavaScript developer since we don't expose any documentation about the errors that could happen in JS. which includes all errors in all crates.

It might be helpful in this case if JS errors are more categorized. For example the name of the above mentioned error could be ParseError and we only document and expose a few (maybe up to 10) of these top-level errors. For example ParseError, TangleError, EncodingError, StorageError, InvalidInputError , VerificationError ..etc.
This makes it much easier to match against these error names for example:

    try {
        ...
        account.publish()
    }catch(e){
        if(e instanceof Error && e.name === "TangleError"){
            ShowTangleStateError(e.message);
        } else{
            ...
        }
    }

In this case the developer doesn't care about what happened exactly as soon as it has something to do with the tangle, so maybe they can continue by checking their Tangle node or show an error screen to the user.

Also one thing to keep in mind is using custom error types instad of error.name. To my understanding this is not possible using wasm-bindgen since we can't extend JS classes in Rust see here.

2 replies

PhilippGackstatter Feb 2, 2022

I'm not sure what purpose the categories would actually serve, though.

To match on them and change control flow, they are too broad. TangleError (assuming this is basically a client error) tells you too little about what happened. It could be that your network connection is down, which your application cannot do much about, and you might want to shut down the application in that case. If the error was on a node instead, you could ask a different node. Two different situations which give you the same TangleError.
For logging an informative message, the categories aren't needed at all.

I think what might be slightly more helpful is the precise error name that you can match on. The downside is that it's not (reasonably) possible to list all errors ahead of time, so you're kind of driving in the dark. I don't think that's a huge problem, quoting @cycraig:

developers typically only handle specific errors after they experience them, and not all possible errors that could happen - which is why having a large error enum (while not ideal) is not completely horrible

We could reasonably achieve that with a macro, that implements this:

use strum;
use thiserror;

#[derive(Debug, thiserror::Error, strum::IntoStaticStr)]
pub enum CoreError {
    #[error("ordered set duplicate")]
    OrderedSetDuplicate,
    #[error("method not found")]
    MethodNotFound,
}

#[derive(Debug, thiserror::Error, strum::IntoStaticStr)]
pub enum IotaError {
    #[error("core error: {0}")]
    CoreError(#[from] CoreError),
}

#[derive(Debug, thiserror::Error, strum::IntoStaticStr)]
pub enum AccountError {
    #[error("storage error")]
    StorageError,
    #[error("iota error: {0}")]
    IotaError(#[from] IotaError),
}

pub trait LeafError {
    fn leaf_error(&self) -> &'static str;
}

impl LeafError for CoreError {
    fn leaf_error(&self) -> &'static str {
        self.into()
    }
}

impl LeafError for IotaError {
    fn leaf_error(&self) -> &'static str {
        match self {
            IotaError::CoreError(err) => err.leaf_error(),
        }
    }
}

impl LeafError for AccountError {
    fn leaf_error(&self) -> &'static str {
        match self {
            err @ AccountError::StorageError => err.into(),
            AccountError::IotaError(err) => err.leaf_error(),
        }
    }
}

fn main() {
    let err: CoreError = CoreError::OrderedSetDuplicate;

    let err: IotaError = IotaError::from(err);

    let err: AccountError = AccountError::from(err);

    println!("The JS error instance's fields would be:");
    println!("name: {}", err.leaf_error());
    println!("message: {}", err);
}

which prints

The JS error instance's fields would be:
name: OrderedSetDuplicate
message: iota error: core error: ordered set duplicate

Now you can match on OrderedSetDuplicate, for instance, similar to how you would match on CoreError::OrderedSetDuplicate in Rust. That at least enables you to do control flow based on precise errors.

Thoughts?

olivereanderson Feb 3, 2022
Author

This is a nice idea. I'm all for trying this out as long as point 4: """Replace blanket #[from] implementations of external errors with errors for specific cases as mentioned in Strategy 4. NOTE: this may actually increase the number of variants, so it does not have to be a hard rule. """
from @cycraig's suggestion is still applied where it makes sense. By that I mean that we do our best to convey the context where the functionality from the external crate failed.

olivereanderson · 2022-02-02T17:04:55Z

olivereanderson
Feb 2, 2022
Author

We settled on implementing Strategy 5 and re-evaluating if more should be done to improve error handling in the JS bindings once this is done. Progress on implementing the new strategy will be tracked in issue #534.

0 replies

[DONE] How do we refactor our errors? #636

Introduction

Guidelines strategy 1:

Guidelines strategy 2:

Guidelines strategy 3:

Guidelines strategy 4:

Edit: Additional strategies after collecting feedback.

Guidelines strategy 5.

Guidelines strategy 6:

Replies: 9 comments · 13 replies

olivereanderson Jan 31, 2022 Author

olivereanderson Jan 31, 2022 Author

olivereanderson Jan 31, 2022 Author

olivereanderson Jan 31, 2022 Author

olivereanderson Jan 31, 2022 Author

coodos Feb 1, 2022 Collaborator

olivereanderson Feb 1, 2022 Author

coodos Feb 1, 2022 Collaborator

Strategy 1

Strategy 2

Strategy 3

Strategy 4

So what should we do?

olivereanderson Feb 1, 2022 Author

olivereanderson Feb 2, 2022 Author

olivereanderson Feb 3, 2022 Author

olivereanderson Feb 2, 2022 Author

Replies: 9 comments 13 replies

olivereanderson
Jan 31, 2022
Author

olivereanderson
Jan 31, 2022
Author

olivereanderson
Jan 31, 2022
Author

olivereanderson
Jan 31, 2022
Author

olivereanderson Jan 31, 2022
Author

coodos
Feb 1, 2022
Collaborator

olivereanderson Feb 1, 2022
Author

coodos Feb 1, 2022
Collaborator

olivereanderson Feb 1, 2022
Author

olivereanderson Feb 2, 2022
Author

olivereanderson Feb 3, 2022
Author

olivereanderson
Feb 2, 2022
Author