Rust Error Handling

Introduction

Error handling in Rust has been a topic of discussion for years. There's a working group devoted to it (announcement here) that seems active, but to be honest I haven't heard a lot from them (naturally, a few days after I wrote this, they popped-up on Reddit). There has, however, been a steady stream of articles & posts about best practices in this regard. I surveyed them while reconsidering my own error-handling strategy for mpdpopm & found myself taking pieces from each, but no one author really captured my thinking. With this, I throw my own thoughts on the pile.

Context

There's broad agreement on the first-level of taxonomy of errors: recoverable versus unrecoverable. In the latter case, when you've reached a condition that "should never happen", you panic. This is covered in the first section under error handling in the Rust Book. You typically see this in examples, prototypes & tests, but none less than fasterthanli.me argues that it has a place in library code:

"Here's my personal take on this:

panicking is fine in application code
panicking is fine in library code too, when it indicates either of:
- the library has a bug (it makes assumptions that turned out to be false)
- the library is being misused in a way that we could not check with the type system.
In general, panicking is fine if nothing useful can be done when that error occurs except for quitting."

– Improving error handling - panics vs. proper errors

I'm a huge fan of Amos' (I'm working through his series on building an ELF packer right now, in fact), but I've been burned enough times over the years by libraries that decided for me that they should exit when I was perfectly capable of handling the situation that I can't quite sign-up for this.

The Rust Book argues that this is fine:

use std::net::IpAddr;
let home: IpAddr = "127.0.0.1".parse().unwrap();

and I suppose I agree, but anything more than that is out for library code, in my opinion.

The Interesting Part

That brings us to the interesting aspect of Rust error handling: recoverable errors. These are, of course, modeled by return types of std::result::Result:

pub enum Result<T, E> {
    Ok(T),
    Err(E),
}

with the generic parameter E being a type that implements std::error::Error:

pub trait Error: Debug + Display {
    fn source(&self) -> Option<&(dyn Error + 'static)> { ... }
    fn backtrace(&self) -> Option<&Backtrace> { ... }
    // ...
}

Thing is, as an author, committing to returning Result<my thing, my error type> leaves a lot of room for interpretation: hence the long list of posts regarding how to do it.

An Emerging Consensus

Two years ago, a consensus began to emerge that the correct taxonomy for recoverable errors is application code versus library code; that libraries should concern themselves with producing fine-grained errors and applications with reporting them. Furthermore (said the consensus), applications should use anyhow & libraries thiserror (both of which were released at 1.0 in October of 2019).

For instance, in the Reddit conversation 2019 Q4: Error patterns - `Snafu` vs `err-derive + anyhow` : rust", dtolnay himself (author of anyhow, thiserror, serde and many others) says:

"There are two pretty opposite desires among consumers of error libraries, one where errors are 'whatever, just make it easy' and a different where every error type is artisanally designed… Usually application code tends toward the first kind and library code tends toward the second kind."

In the Reddit conversation Best error-handling practices - Jan 2020, the top-voted answer (by u/Kyrenite) states:

"I'm going to give a simple, opinionated answer:

Libraries -> thiserror
Applications -> anyhow

… it's just about the most straightforward sensible implementation of both of these ideas, and it's convenient that it is in two separate crates rather than one because they are logically two separate problems."

Nick Groenen argued in "Rust: Structuring and handling errors in 2020" (spring 2020) that "libraries should focus on producing meaningful, structured error types/variants" and that "applications mainly consume errors".

The "libraries produce & applications consume" view seemed for a time to have become normative.

Or is it?

I am heartened by the number of authors I have seen question this, because I don't think this is the right way to come at the problem.

In the Reddit conversation discussing Nick's post, u/Yaahallo (who is on the Rust error handling working group) notes: "I think the bit about error handling being different depending on if you're writing a library vs an application is simplification that's common in the rust community but also a source of confusion.

The reasons for using anyhow vs thiserror aren't really based on if it's a library or an application, it's actually about whether or not you need to handle errors or report them."

In Error Handling In Rust - A Deep Dive this summer, Luca Palmieri argues that it is "time to address a common Rust myth:

anyhow is for applications, thiserror is for libraries.

It is not the right framing to discuss error handling. You need to reason about intent."

This, in my opinion, is exactly right. You should regard the error path as just as legitimate as the happy path, and think in terms of your consumer's intent. True, applications & libraries will generally have different needs in this regard, but that's a consequence of a more essential principle: your error return value is as legitimate as your success return value, and should likewise be designed to be convenient to your caller.

I don't see reporting & responding to errors as disjoint– your caller will in general want to do both. With regard to the first, today the idea is that, well, an Error implementation must also implement Display & Debug; use the former for messages intended for users & the latter for those implemented for developers.

I'm watching Jane Lusby's talk at RustConf 2020 as I write this, and I agree that this is likely not enough structure to handle all the scenarios in which one might want to report an error. That said, I'm leery of building-up that much structure in my error types just to support ever-more-featureful error pretty-printing libraries.

I don't have a lot to say on the subject, so I want to turn to the second item: responding to errors.

How Your Callers Are Really Going to Handle Your Artisinally Crafted Errors

So let us consider errors strictly from the perspective of allowing our callers to reason about them. The route taken by many Rust libraries today is to reason that "I can't know who is calling me, so I need to provide as much flexibility as possible in allowing them to reason about my failure modes". I love dtolnay's description of such error types as "artisinal": they provide a lovingly-crafted enumeration that implements Error. It has dozens of bespoke variants that describe each possible way of failing in great detail. They have, consciously or not, adopted guideline one of Context-preserving error handling: "Each unique fallible expression should have at least one unique error."

In fact, your callers are going to spit on your lovingly crafted Error implementation. If you're lucky, they will log it on their way out of their function (using your Debug implementation). If you're really lucky, they'll demote your precious type to a Box<dyn Error> and shove it into the source field of their error type– then they'll bail. In other words, most of your callers, presented with failure on the part of your method, will perhaps attach a bit of context & fail their method. As will their caller, and theirs & so on.

These errors will bubble up, one at a time, hopefully building a chain of Error implementations, until a natural choke point in the ambient application. It could be main() or thread function or a command loop of some sort, or perhaps the boundry of your physical abstraction, at which point you need to translate the top-most Error implementation to a protocol-specific error. Imagine a web serivce handler, discovering that one of its callees has failed, now needs to map the top-most error to an HTTP status.

Guess what: your enumeration with forty-seven carefully-crafted variants is going to get mapped into a few buckets:

you called me out of contract: here's a detailed message describing what you got wrong
you called me in contract, but I still failed: but hey– it was a resource problem: try again & I might succeed
you called me in contract, but I'm never going to succeed– give up: I have a bug, I only discovered at run-time that your request can't be accomodated, &c

Furthermore, there is a real cost to providing a detailed error type. If you've gone the usual route & implemented Error on an enumeration, and you publish, congratulations: any addition to your enumeration is now a breaking change for your consumers (unless you marked it non-exhaustive). If you've provided a source() implementation that uses the concrete type of the underlying error you got, congratulations again: you're now leaking details of your implementation to your callers (does your caller really need to know that you're using crate A instead of create B in your implementation?)

Luca (Error Handling in Rust - A Deep Dive) goes on to say: "Reason carefully about your usecase and the assumptions you can afford to make in order to design the most appropriate error type - sometimes Box<dyn std::error::Error> or anyhow::Error are the most appropriate choice, even for libraries."

So What Do I Do?

If you are writing a simple function, and there's really a "go/no go" sense to the contract, you should seriously conser returning a Result<my thing, Box<dyn Error>>: it has a lot of advantages:

no loss of reportabilty (such as it is)
you can make gross changes to your implementation without changing the interface
no leakage of implementation details

That's not very likely, so more generally, I'd suggest:

an enumeration that implements std::error::Error
with a few carefully chosen variants

The variants should be selected not by failure modes of your implementation, but by what your caller would find interesting if they're the last stop in the chain of errors being propagated up the stack. A good rule-of-thumb would be that if you're considering a new variant, and your caller would pretty-much have to respond to it in the same way as an existing mode of failure then re-use the same variant with different parameters.

You'll want to attach a backtrace to your errors– they're essential for troubleshooting. Use either the standard implementation (nightly-only) or use the backtrace crate.

You'll want to map conveniently from the errors your callees return to the error you return. There's a temptation to implement From<OtherErrorType>, which makes it easy to just use the ? operator. Simonas suggests not doing this: "These implementations make it way too easy to miss violations of the guideline above [Each unique fallible expression should have at least one unique error]. The mental overhead this implementation adds is huge and is not worth it over saving a couple of map_errs in code."

I don't like blanket rules like this, but he's got a point– in your From implementation all you have is the knowledge that you've got, say, an io::Error. If you call, say, six functions in your implementation that return io::Error you can't really distinguish between any of them.

A better approach is adding context at your call site:

pub fn foo(name: &str) -> Result<Thing, MyError> {
    let mut f = File::open(name).map_err(|err| MyError::OpenFailed{name: String::from(name), source: err})?;
    // .. and so on.
}

The idea here, of course, is that the returned io::Error will be slotted into your MyError::OpenFailed variant as the source field. Think hard about whether you want to keep that as an io::Error or a Box<dyn Error>. Simonas goes on to say "consider… keep[ing] error data mostly private. Having a unique error for each failure mode necessarily exposes (through the public API) the implementation details of the library which in turn may make evolution of the library difficult."

In my experience, this isn't a serious problem: if you update your implementation to use a different Error type, it will only break consumers that depend on the details of that particular type (unlikely).

Implementations

I will defer to BurntSushi, in [4] "The simplest answer, and the one I've stood by for years for core ecosystem libraries, is to not use any error handling helper crate at all. Just stick with the standard library's Error trait and you'll be fine. Most or all of the error handling helper crates are compatible with this approach since they all rely on the Error trait to work. (I believe failure is the only prominent such library with a more muddled compatibility story.)"

Furthermore, in [6], he notes that using procedural macros (such as libraries like thiserror & Snafu) can blow-up compilation times: "They can have a big impact on compilation times. If you're already using proc macros somewhere, or if your dependency tree is already necessarily large, then thiserror seems like a great choice. But otherwise, writing out the impls by hand is very easy to do…In terms of compile times, the thiserror version took 5 seconds and 7.5 seconds in debug and release mode, respectively. Without thiserror took 0.37 and 0.39 seconds in debug and release mode, respectively. That's a fairly sizable improvement.

I think using thiserror makes a lot of sense in many circumstances. For example, if I were starting a new project in Rust at work, then using thiserror as a foundation to make quickly building error types in all my internal libraries would be a really nice win. Because writing boiler plate would otherwise (I think) discourage writing out more structured errors, where as thiserror makes it deliciously easy and painless. But I think if I were writing an open source library for others to use, then I think avoiding passing on this compile time hit to everyone else in exchange for a couple minutes of writing some very easy code is well worth it."

I haven't seen order-of-magnitude improvements in compilation times, but when I removed Snafu from mpdpopm in favor of hand-coded Error enums I saw a 21% speed-up in compilation times.

If you're committed to an error crate, the two I've found most consistent with these principles are thiserror & Snafu:

	thiserror	snafu
can attach context	need map_err	yes
transparent mapping	yes ¹	yes ²
Display impl.	yes	yes
Debug impl.	derived	derived
source() impl.	yes ³	yes ³
backtrace	yes ⁴	yes ⁵
downloads	25.7M	2.2M

I personally prefer Snafu for the explicit context() & because it provides backtraces on stable.

List of References

Gallant, Andrew. "Error Handling in Rust", May 2015. https://blog.burntsushi.net/rust-error-handling (retrieved December 4, 2021)– a bit dated, but hey, it's BurntSushi.
Klabnik, Steve & Nichols, Carrol. "The Rust Book", "Unrecoverable Errors with panic!" https://doc.rust-lang.org/book/ch09-01-unrecoverable-errors-with-panic.html (retrieved December 4, 2021)
Amos. "Improving error handling - panics vs. proper errors" https://fasterthanli.me/series/making-our-own-ping/part-10 (retrieved December 4, 2021)
"Best error-handling practices - Jan 2020". https://www.reddit.com/r/rust/comments/ej67aa/best_errorhandling_practices_jan_2020/ (retrieved December 5, 2021)
Groenen, Nick, May 2020. "Rust: Structuring and handling errors in 2020" https://nick.groenen.me/posts/rust-error-handling/#libraries-versus-applications (retrieved December 4, 2021)
"Rust: Structuring and handling errors in 2020" (companion Reddit discussion) https://www.reddit.com/r/rust/comments/gj8inf/rust_structuring_and_handling_errors_in_2020/ (retrieved December 5, 2021)
Palmieri, Luca. "Error Handling in Rust - A Deep Dive", May 2021. https://www.lpalmieri.com/posts/error-handling-rust (retrieved December 5, 2021)
"2019 Q4: Error patterns - `Snafu' vs `err-derive + anyhow'", https://www.reddit.com/r/rust/comments/dfs1zk/2019_q4_error_patterns_snafu_vs_errderive_anyhow/ (retrieved December 5, 2021)
Kazlouska, Simonas. Context-preserving error handling, January 2020. https://kazlauskas.me/entries/errors (retrieved December 5, 2021)
Wuyts, Joshua. "Error Handling Survey - 2019-11-13". https://blog.yoshuawuyts.com/error-handling-survey/#analysis (retrieved December 5, 2021)

{{{date(%D %H:%M}}}

Footnotes:

need to use [from] on the source error member, or you can just say #[error(transparent)] to forward the soruce() & Dispay methods straight through to an underlying error

need to use [context(false)] if you're not using .context() on the underlying error

if you have a field named "source" or decorated with [source]

⁴

need to have a field named "backtrace" (I think you can control both with decorations, too) Only on nightly

⁵

need to have a field named "backtrace" (I think you can control both with decorations, too) Provides it's own Backtrace struct

Unwound Stack