In this era of 140 characters or less, it has been well and concisely stated that, “RELIABILITY concerns ACCIDENTAL errors causing failures, whereas SECURITY concerns INTENTIONAL errors causing failures.” In this column I expand on this statement, especially as regards the design of embedded systems and their place in our network-connected and safety-concious modern world.
As the designers of embedded systems, the first thing we must accomplish on any project is to make the hardware and software work. That is to say we need to make the system behave as it was designed to. The first iteration of this is often flaky; certain uses or perturbations of the system by testers can easily dislodge the system into a non-working state. In common parlance, “expect bugs.”
Given time, tightening cycles of debug and test can get us past the bugs and through to a shippable product. But is a debugged system good enough? Neither reliability nor security can be tested into a product. Each must be designed in from the start. So let’s take a closer look at these two important design aspects for modern embedded systems and then I’ll bring them back together at the end.
Reliable Embedded Systems
A product can be stable yet lack reliability. Consider, for example, an anti-lock braking computer installed in a car. The software in the anti-lock brakes may be bug-free, but how does it function if a critical input sensor fails?
Reliable systems are robust in the face of adverse run-time environments. Reliable systems are able to work around errors encountered as they occur to the system in the field–so that the number and impact of failures are minimized. One key strategy for building reliable systems is to eliminate single-points-of-failure. For example, redundancy could be added around that critical input sensor–perhaps by adding a second sensor in parallel with the first.
Another aspect of reliability that is under the complete control of designers (at least when they consider it from the start) are the “fail-safe” mechanisms. Perhaps a suitable but lower-cost alternative to a redundant sensor is detection of the failed sensor with a fall back to mechanical braking.
Failure Mode and Effect Analysis (FMEA) is one of the most effective and important design processes used by engineers serious about designing reliability into their systems. Following this process, each possible failure point is traced from the root failure outward to its effects. In an FMEA, numerical weights can be applied to the likelihoods of each failure as well as the seriousness of consequences. An FMEA can thus help guide you to a cost effective but higher reliability design by highlighting the most valuable places to insert the redundancy, fail-safes, or other elements that reinforce the system’s overall reliability.
In certain industries, reliability is a key driver of product safety. And that is why you see these techniques and FMEA and other design for reliability processes being applied by the designers of safety-critical automotive, medical, avionics, nuclear, and industrial systems. The same techniques can, of course, be used to make any type of embedded system more reliable.
Regardless of your industry, it is typically difficult or impossible to make your product as reliable via patches. There’s no way to add hardware like that redundant sensor, so your options may reduce to a fail-safe that is helpful but less reliable overall. Reliability cannot be patched or tested or debugged into your system. Rather, reliability must be designed in from the start.
Secure Embedded Systems
A product can also be stable yet lack security. For example, an office printer is the kind of product most of us purchase and use without giving a minute of thought to security. The software in the printer may be bug-free, but is it able to prevent a would-be eavesdropper from capturing a remote electronic copy of everything you print, including your sensitive financial documents?
Secure systems are robust in the face of persistent attack. Secure systems are able to keep hackers out by design. One key strategy for building secure systems is to validate all inputs, especially those arriving over an open network connection. For example, security could be added to a printer by ensuring against buffer overflows and encrypting and digitally signing firmware updates.
One of the unfortunate facts of designing secure embedded systems is that the hackers who want to get in only need to find and exploit a single weakness. Adding layers of security is good, but if even any one of those layers remains fundamentally weak, a sufficiently motivated attacker will eventually find and breach that defense. But that’s not an excuse for not trying.
For years, the largest printer maker in the world apparently gave little thought to the security of the firmware in its home/office printers, even as it was putting tens of millions of tempting targets out into the world. Now the security of those printers has been breached by security researchers with a reasonable awareness of embedded systems design. Said one of the lead researchers, “We can actually modify the firmware of the printer as part of a legitimate document. It renders correctly, and at the end of the job there’s a firmware update. … In a super-secure environment where there’s a firewall and no access — the government, Wall Street — you could send a résumé to print out.”
Security is a brave new world for many embedded systems designers. For decades we have relied on the fact that the microcontrollers and Flash memory and real-time operating systems and other less mainstream technologies we use will protect our products from attack. Or that we can gain enough “security by obscurity” by keeping our communications protocols and firmware upgrade processes secret. But we no longer live in that world. You must adapt.
Consider the implications of an insecure design of an automotive safety system that is connected to another Internet-connected computer in the car via CAN; or the insecure design of an implanted medical device; or the insecure design of your product.
Too often, the ability to upgrade a product’s firmware in the field is the very vector that’s used to attack. This can happen even when a primary purpose for including remote firmware updates is motivated by security. For example, as I’ve learned in my work as an expert witness in numerous cases involving reverse engineering of the techniques and technology of satellite television piracy, much of that piracy has been empowered by the same software patching mechanism that allowed the broadcasters to perform security upgrades and electronic countermeasures. Ironically, had the security smart cards in those set-top boxes had only masked ROM images the overall system security may have been higher. This was certainly not what the designers of the system had in mind. But security is also an arms race.
Like reliability, security must be designed in from the start. Security can’t be patched or tested or debugged in. You simply can’t add security as effectively once the product ships. For example, an attacker who wished to exploit a current weakness in your office printer or smart card might download his hack software into your device and write-protect his sectors of the flash today so that his code could remain resident even as you applied security patches.
Reliable and Secure Embedded Systems
It is important to note at this point that reliable systems are inherently more secure. And that, vice versa, secure systems are inherently more reliable. So, although, design for reliability and design for security will often individually yield different results–there is also an overlap between them.
An investment in reliability, for example, generally pays off in security. Why? Well, because a more reliable system is more robust in its handling of all errors, whether they are accidental or intentional. An anti-lock braking system with a fall back to mechanical braking for increased reliability is also more secure against an attack against that critical hardware input sensor. Similarly, those printers wouldn’t be at risk of fuser-induced fire in the case of a security breach if they were never at risk of fire in the case of any misbehavior of the software.
Consider, importantly, that one of the first things a hacker intent on breaching the security of your embedded device might do is to perform a (mental, at least) fault tree analysis of your system. This attacker would then target her time, talents, and other resources at one or more single points of failure she considers most likely to fail in a useful way.
Because a fault tree analysis starts from the general goal and works inward deductively toward the identification of one or more choke points that might produce the desired erroneous outcome, attention paid to increasing reliability such as via FMEA usually reduces choke points and makes the attacker’s job considerably more difficult. Where security can break down even in a reliable system is where the possibility of an attacker’s intentionally induced failure is ignored in the FMEA weighting and thus possible layers of protection are omitted.
Similarly, an investment in security may pay off in greater reliability–even without a directed focus on reliability. For example, if you secure your firmware upgrade process to accept only encrypted and digitally signed binary images you’ll be adding a layer of protection against an inadvertently corrupted binary causing an accidental error and product failure. Anything you do to improve the security of communications (i.e., checksums, prevention of buffer overflows, etc.) can have a similar effect on reliability.
The Only Way Forward
Each year it becomes increasingly important for all of us in the embedded systems design community to learn to design reliable and secure products. If you don’t, it might be your product making the wrong kind of headlines and your source code and design documents being poured over by lawyers. It is no longer acceptable to stick your head in the sand on these issues.