Live Fast – Die Young

– the story of Erlang Error Handling

I recently had the honor to be guest at the Cofinpro Podcast and we talked about Erlang among other related things. One topic we touched lightly in just one sentence was Erlang’s error handling.

The way an Erlang developer will handle errors and exceptions might surprise other developers. If one writes software in a language like C or Java the developer is used to think about all the possible ways one’s software might fail and handle that failure to prevent crashes.

An Erlang developer will not prevent failure. He will not even think about failure prevention at all. All an Erlang developer will think about is the aftermath of failure and crashes. This post is a first overview of the ways of Erlang’s crash site treatment.

Let It Crash

Before developing software in Erlang I was used to writing software defensively. That means I thought hard about what could go wrong and what should happen instead. The defensive code is littered with checks for arguments and types, with try-catch-finally-frames and log messages.

In most languages working with multiple processes is painful and error-prone. So many programs will not have more than one process. If that process dies from an unhandled error the whole program crashes and leaves the user out in the rain.

So a good developer will test and check and prove that his software works in all possible cases (that one can think of). The result is code that is full of error checking code that is convoluted with the business logic.

In Erlang just let it crash. As simple as that. It is more or less the opposite of defensive programming. Since processes are cheap and in Erlang processes will often be used like objects in other languages an Erlang developer will let the process crash and die. The software that solves the problem will only care about problem-solving. Writing the part the developer will assume that all input will be faultless and failure will not happen.

Let Someone Else Fix It

In the bouquet of processes, an Erlang developer will set up monitoring processes. Those monitors will not contain business logic but they will monitor the health of other processes. If a process crashes and dies it will know what to do with that. This monitoring works across machine boundaries since one cannot make fault tolerant systems on a single machine.

That is the second part of an Erlang software. Next to the problem-solving business logic the failure handling and error-correcting code often is generic so it can be reused in future applications.

In my opinion, this is a nice separation of concerns. Writing code that solves the problem and separating it from code that fixes failures.

For further reading, I recommend Programming Erlang by Joe Armstrong.