Barr Code

Tuesday, June 23, 2009

Firmware Disasters

First, an Airbus A330 fell out of the sky. Then two D.C. Metro trains collided. Several hundred people have been killed and injured in these disastrous system failures. Did bugs in embedded software play a role in either or both disasters?

An incident on an earlier (October 2006) Airbus A330 flight may offer clues to the crash of Air France 447:

Qantas Flight 72 had been airborne for three hours, flying uneventfully on autopilot from Singapore to Perth, Australia. But as the in-flight dinner service wrapped up, the aircraft's flight-control computer went crazy. The plane abruptly entered a smooth 650-ft. dive (which the crew sensed was not being caused by turbulence) that sent dozens of people smashing into the airplane's luggage bins and ceiling. More than 100 of the 300 people on board were hurt, with broken bones, neck and spinal injuries, and severe lacerations splattering blood throughout the cabin. (Article, Time Magazine, June 3, 2009)


Authorities have blamed a pair of simultaneous computer failures for that event in the fly-by-wire A330. First, one of three redundant air data inertial reference units (ADIRUs) began giving bad angle of attack (AOA) data. Simultaneously, a voting algorithm intended to handle precisely such a failure in 1 of the 3 units by relying only on the other matching data failed to work as designed; the flight computer instead made decisions only on the basis of the one failed ADIRU!

(A later analysis by Airbus "found data fingerprints suggesting similar ADIRU problems had occurred on a total of four flights. One of the earlier instances, in fact, included a September 2006 event on the same [equipment] that entered the uncommanded dive in October [2006]." Ibid.)

Much of the attention in the publicly disclosed details of the Air France 447 crash has focused on the failure of one of several air speed indicators. Were there three of those as well? If so, was the same flight computer to blame for failing to recognize which to trust and which was unreliable?

It is very early in the investigation of yesterday's collision between two D.C. Metro red line trains, in which a stopped train was rear-ended and heavily damaged by a moving train on the same track, to place blame. But a WashingtonPost.com article headlined "Collision was Supposed to be Impossible" says it all:

Metro was designed with a fail-safe computerized signal system that is supposed to prevent trains from colliding.


and

During morning and afternoon rush hours, all trains except longer eight-car trains typically operate in automatic mode, meaning their movements are controlled by computerized systems and the central Operations Control Center. Both trains in yesterday's crash [about 5pm] were six-car trains. (Article, Washington Post, June 23, 2009)


Are bugs in embedded software to blame for these two disasters? You can bet the lawyers are already looking into it.

Labels: , , , ,

AddThis Social Bookmark Button

Thursday, May 21, 2009

Annual Embedded Engineer Survey -- Call for Participation

VDC Research is conducting its annual survey of embedded engineers. If you are involved in the engineering of embedded systems, you should take the survey. The research covers embedded software, hardware, tools, and development practices.

VDC will provide all respondents who complete the survey:

* A summary of the 2009 survey findings
* A chance to win a $100 Amazon.com gift certificate
* Instant access to a summary of VDC's 2008 survey findings

To begin the survey, go to:
http://www.vdcresearch.com/misc/surveys/09esdt/?RID=M

Labels: ,

AddThis Social Bookmark Button

Tuesday, April 07, 2009

Embedded Systems Conference Wrap-Up

I spent last week in San Jose, at the Embedded Systems Conference (ESC). As I have come to expect (this was my twelfth consecutive year as a speaker), the event remains the place for embedded systems developers to be. There is no other similar event for learning about the latest processors, middleware and programming techniques; running into old friends and making new friends; and meeting with vendors past and possibly future. Everybody who's anybody in the embedded systems community is typically at this key trade show.

This 21st ESC was smaller than those of the past few years. Vendor booths mostly filled the main hall at San Jose's McEnery Convention Center, as well as the area outside the main hall entrance. But that makes the total square footage much smaller than in recent years. Recall that ESC moved from San Jose to San Francisco for a few years because McEnery was insufficient (and everyone abhorred the "tennis bubble" expansion floor)--ESC only came back to the Silicon Valley proper after McEnery was expanded via the adjacent Marriott hotel and conference center. This shrinkage seems to be recession-related as I am told that "cancellation revenues" (i.e., payments by vendors who had previously committed to renting space but didn't show) hit record levels.

However, overall attendance "felt" healthy. All of the paid courses were well attended (e.g., my course on Embedded C Coding Standards drew over 120 people to a room set with 100 chairs). And thanks to the "more intimate" venue consolidation, booth traffic on the show floor seemed quite reasonable. I am told that over 3,000 of the free "Engineer Survival Kit" bags were given out to an estimated 5,000 overall show-floor attendees.

Netrino had a huge presence at the show this year, including a booth.

In addition to my three paid courses, two Netrino staff engineers and I also made three "open to the public" presentations in the ESC Theater on the show floor. These more theatrical productions (Adventures in Satellite TV Piracy, RTOS MythBusters, and This Code Stinks--The Worst Embedded Code We've Ever Seen) were almost as much fun to create as they were to deliver. All three were quite popular with 200-250 attendees, including many standing around in the surrounding aisles. Everyone I talked with thought they were educational and fun at the same time--which is exactly what we were aiming for. These events and our booth worked nicely to drive traffic to each other, making it a great show for us all around.

David Markey has a nice wrap-up of the show floor games and goodies in his blog at Product Design & Development. Elsewhere, TechInsights has a page of links to vendor announcements of new products, services, partnerships, and initiatiatives.

Of the announcements, Microsoft's renewed focus on embedded systems as an important category and a shift in the way they are approaching the market was the most interesting to me. I heard through the grapevine that Microsoft is noticing their past large Windows CE and XPE customers tend not to be sticky (i.e., top 10 customers shift every year). And their positioning seems to be maturing with an aim toward changing that. I like the clarity and coherence of Microsoft's embedded product rearrangement, which--because of the "POSReady" and "NavReady" packages, in particular--reminds me quite a bit of Sun's Java 2 Micro Edition horizontal+vertical arrangement of a few years back.

Labels: ,

AddThis Social Bookmark Button

Monday, April 06, 2009

Coding Standard Rule #10: Don't Use the Comma Delimiter Within Variable Declarations

Rule: The comma (‘,’) operator shall not be used within variable declarations.

Example (don’t):

char * x, y; // did you want y to be a pointer or not?


Reasoning: The cost of placing each declaration on a line of its own is low. By contrast, the risk that you've made a mistake and the compiler or a maintainer won't understand your intentions is high.

Labels: , , ,

AddThis Social Bookmark Button

Friday, April 03, 2009

Coding Standard Rule #9: Don't Create Function-Like Macros

Rule: Parameterized macros shall not be used if an inline function can be written to accomplish the same task.

Example:

#define MAX(A, B) ((A) > (B) ? (A) : (B)) // Don’t do this ...
inline int max(int a, int b) // ... if you can do this instead.


Reasoning: There are a lot of risks associated with the use of preprocessor #defines, and many of them relate to the creation of parameterized macros. The extensive use of parentheses (as shown in the example) is important, but does not eliminate the unintended double increment possibility of a call such as MAX(i++, j++). Other risks of macro misuse include comparison of signed and unsigned data or any test of floating-point data. The C++ keyword inline was added to the C standard in the 1999 ISO update.

Labels: , , ,

AddThis Social Bookmark Button

Thursday, April 02, 2009

Coding Standard Rule #8: Don't Mix Signed and Unsigned Data

Rule: Signed integers shall not be combined with unsigned integers in comparisons or expressions. In support of this, decimal constants meant to be unsigned should be declared with a ‘u’ at the end.

Example (don’t):

uint8_t a = 6u;
int8_t b = -9;

if (a + b < 4)
{
// This correct path should be executed
// if -9 + 6 were -3 < 4, as anticipated.
}
else
{
// This incorrect path is actually executed
// because -9 + 6 becomes (0xFF – 9) + 6 = 252.
}

Reasoning: Several details of the manipulation of binary data within signed integer containers are implementation-defined behaviors of the C standard. Additionally, the results of mixing signed and unsigned data can lead to data-dependent bugs.

Labels: , , ,

AddThis Social Bookmark Button

Wednesday, April 01, 2009

Coding Standard Rule #7: Don't Mix Bit-Wise Operators and Signed Data

Rule: None of the bit-wise operators (i.e., &, |, ~, ^, <<, and >>) shall be used to manipulate signed integer data.

Example (don’t):

int8_t signed_data = -4;
signed_data >>= 1; // not necessarily -2


Reasoning: The C standard does not specific the underlying format of signed data (e.g., 2’s complement) and leaves the effect of some bit-wise operators to be defined by the compiler author.

Labels: , , ,

AddThis Social Bookmark Button

Tuesday, March 31, 2009

Coding Standard Rule #6: Use C99's Fixed-Width Integer Type Names

This is the sixth in a continuing series of blog posts describing simple coding rules that help keep bugs out of embedded C programs.

Rule: Whenever the width, in bits or bytes, of an integer value matters in the program, fixed width data types shall be used in place of char, short, int, long, or long long. The signed and unsigned fixed width integer types shall be as shown in the table below.






Integer WidthSigned TypeUnsigned Type
8 bits / 1 byteint8_tuint8_t
16 bits / 2 bytesint16_tuint16_t
32 bits / 4 bytesint32_tuint32_t
64 bits / 8 bytesint64_tuint64_t

Reasoning: The ISO C standard allows implementation-defined widths for char, short, int, long, and long long types, which leads to portability problems. Though the 1999 standard did not change this underlying issue, it did introduce the uniform type names shown in the table, which are defined in the new header file stdint.h. These are the names to use even if you have to create the typdefs by hand.

Labels: , , ,

AddThis Social Bookmark Button

Monday, March 30, 2009

Coding Standard Rule #5: Only use Comments for Commenting

This is the fifth in a continuing series of blog posts describing simple coding rules that help keep bugs out of embedded C programs.

Rule: Comments shall neither be nested nor used to disable a block of code, even temporarily. To temporarily disable a block of code, use the preprocessor’s conditional compilation feature (e.g., #if 0 … #endif).

Example (don’t):
/*
a = a + 1;

/* comment */
b = b + 1;
*/

Reasoning: Nested comments and commented-out code both run the risk of allowing unexpected snippets of code to be compiled into the final executable. This can happen, for example, in the case of sequences such as the above.

Labels: , , ,

AddThis Social Bookmark Button

Friday, March 27, 2009

Coding Standard Rule #4: Use volatile Whenever Possible

This is the fourth in a continuing series of blog posts describing simple coding rules that help keep bugs out of embedded C programs.

Rule: The volatile keyword shall be used whenever appropriate, including:

  • To declare a global variable accessible (by current use or scope) by any interrupt service routine,
  • To declare a global variable accessible (by current use or scope) by two or more tasks, and
  • To declare a pointer to a memory-mapped I/O peripheral register set (e.g., timer_t volatile * const p_timer ).


Reasoning: Proper use of volatile eliminates a whole class of difficult-to-detect bugs by preventing the compiler from making optimizations that would eliminate requested reads or writes to variables or registers that may be changed at any time by a parallel-running entity.

Anecdotal evidence suggests that programmers unfamiliar with the volatile keyword think their compiler’s optimization feature is more broken than helpful and disable optimization. The authors suspect, based on experience consulting with numerous companies, that the vast majority of embedded systems contain bugs waiting to happen due to a shortage of volatile keywords. These kinds of bugs often exhibit themselves as “glitches” or only after changes are made to a “proven” code base.

Labels: , , ,

AddThis Social Bookmark Button