Is Reliable Multithreaded Software Possible?

Until earlier this month, I’d overlooked a most interesting May 2006 article in Embedded Software Design magazine by Mark Bereit titled “Escape the Software Development Paradigm Trap“. The article opines that the methods we use to design embedded software, particularly multitasked software with interrupt service routines and/or real-time operating systems, are fundamentally incompatible with reliability.

Here’s the critical analogy:

Imagine for a minute that I’ve invented the Universal Bolt. This is a metal object for joining threaded holes that can extend or collapse to fit a variety of lengths. It can expand or contract to fit holes of different diameters. The really cool feature is that I have replaced the bolt’s spiral ridge with a series of extendable probes that can accommodate different thread pitches. You no longer need to stock a variety of bolts of different sizes and lengths and thread spacings because my Universal Bolt can be used in place of any of them.

Because it’s able to change configurations extremely quickly, a single Universal Bolt can take the place of many conventional bolts simultaneously. What we do is rig up a clever and very fast dispatcher device that quickly moves the [Universal Bolt] from hole to hole. If the dispatcher is fast enough, my Universal Bolt can spend a moment in each hole in turn and get the whole way through your [mechanical] product so fast that it returns to each hole before the joint has had a chance to separate.

You’d have to be crazy to fly in an airplane designed this way. “If anything caused the dispatcher to derail, the entire product would collapse in a second.” Yet this analogy describes the design of most products powered by embedded computers.

A fast and complex thread dispatcher keeps moving one simple and stupid integer-computation unit all over a big system tending to tasks [and ISRs] rapidly enough that they all get done. And if that dispatcher ever once leads the CPU into an invalid memory address the whole thing crashes to a halt.

Clearly, we need a new paradigm for reliable embedded software architecture. My thoughts on that are coming to this space in 2010.

Leave a Reply

Your email address will not be published. Required fields are marked *