Firmware-Specific Bug #7: Deadlock

A deadlock is a circular dependency between two or more tasks. For example, if Task 1 has already acquired A and is blocked waiting for B while Task 2 has previously acquired B and is blocked waiting for A, neither task will awake. Circular dependencies can occur at several levels in the architecture of a multithreaded system (for example, each task is waiting for an event only the other will send) but here I am concerned with the common problem of resource deadlocks involving mutexes.

Best Practice: Two simple programming practices are each able to entirely prevent resource deadlocks in embedded systems. The first technique, which I recommend over the other, is to never attempt or require the simultaneous acquisition of two or more mutexes. Holding one mutex while blocking for another mutex turns out to be a necessary condition for deadlock. Holding one mutex is never, by itself, a cause of deadlock.[3]

In my view, the practice of acquiring only one mutex at a time is also consistent with an excellent architectural practice of always pushing the acquisition and release of mutexes into the leaf nodes of your code. The leaf nodes are the device drivers and reentrant libraries. This keeps the mutex acquisition and release code out of the task-level algorithmics and helps to minimize the amount of code inside critical sections.[4]

The second technique is to assign an ordering to all of the mutexes in the system (for example, alphabetical order by mutex handle variable name) and to always acquire multiple mutexes in that same order. This technique will definitely remove all resource deadlocks but comes with an execution-time price. I recommend removing deadlocks this way only when you’re dealing with large bodies of legacy code that can’t be easily refactored to eliminate the multiple-mutex dependency.

Footnotes

[3] In theory, the task that wants the mutex could starve while a series of higher priority tasks take turns with the mutex. However, the rate monotonic analysis can be used to ensure this doesn’t happen to tasks with deadlines that must be met.

[4] An additional benefit of this architectural pattern is that it reduces the number of programmers on the team who must remember to use and correctly use each mutex. Other benefits are that each mutex handle can be hidden inside the leaf node that uses it and that doing this allows for easier switches between interrupt disables and mutex acquisition as appropriate to balance performance and task prioritization. One of the most famous priority inversions happened on Mars in 1997. Glitches were observed in Earth-based testing that could not be reproduced and were not attributed to priority inversion until after the problems on Mars forced investigation. For more details, read Glenn Reave’s “What really happened on Mars?” account.

Firmware-Specific Bug #6

Firmware-Specific Bug #8

2 thoughts on “Firmware-Specific Bug #7: Deadlock

Leave a Reply

Your email address will not be published. Required fields are marked *