Saturday, February 06, 2010

Efficient C Tip #11 - Avoid passing parameters by using more small functions

This is the eleventh in a series of tips on writing efficient C for embedded systems. Today's topic will, I suspect, be slightly controversial. This post is based upon two basic observations:
  1. Passing parameters to functions is costly.
  2. Conditional branch instructions can be very costly on CPUs that have instruction caches (even with branch prediction).
I don't think that too many people will disagree with me on the above. Despite this I too often see a style of coding that incurs these costs unnecessarily. I think it's best illustrated by a (real world) example. The issue is one that will be familiar to most of you.  An embedded system contains a number of discrete LEDs (say 3), and the requirement is to write some code to allow higher level code to either turn on, turn off, or toggle a particular LED. The way I often see this coded is as follows:

typedef enum
{
  LED1, LED2, LED3
} LED_NO;

typedef enum
{
  LED_OFF, LED_ON, LED_TOGGLE
} LED_ACTION;

void led(LED_NO led_no, LED_ACTION led_action)
{
  switch (led_no)
  {
  case LED1:
    switch (led_action)
    {
      case LED_OFF:
        PORTB_PORTB0 = 0;
      break;

      case LED_ON:
        PORTB_PORTB0 = 1;
      break;

      case LED_TOGGLE:
        PORTB_PORTB0 ^= 1;
      break;

      default:
      break;
   }
   break;

   case LED2:
     ...
}


So what's wrong with this you ask? Well in a nutshell the parameters passed to the function are used strictly to control the order of execution. There is no code common to any pair or group of parameters. When faced with a situation such as this, I instead implement the code as a large number of very small functions. For example:

void led1_Off(void)
{
  PORTB_PORTB1 = 0;
}

void led1_On(void)
{
  PORTB_PORTB1 = 1;
}
 

void led1_Toggle(void)
{
  PORTB_PORTB1 ^= 1;
}
...

Let's compare the two approaches.

Efficiency


This blog posting is supposedly about efficiency, so let's start with the results. I coded these two approaches up together with a main() function that exercised all 9 possible combination's. I then turned full speed optimization on and looked at the results for an AVR processor.

Single function approach: 78 bytes for main(), 94 bytes for the LED code. Execution time 208 cycles.

Multiple function approach: 42 bytes for main(), 54 bytes for the LED code. Execution time 96 cycles.

Clearly my approach is significantly more efficient.

Usability


By usability I'm referring to the case where someone else needs to use your code. They know they need to say toggle LED2 so they hunt around and find the file led.h. The question is, once they have opened up led.h, how quickly can they determine what they have to do in order to toggle LED2? In the single function case they are presented with just one function (which is a plus), but then they have to locate the enumerations and work out the parameters that need to be passed to the function (which is a minus). In the multiple function case, they have to search through a list of functions looking for the correct one. However once they have found it, it's very clear what the function does.

For me, I think it is a toss up between the two approaches as to which is more usable.

Maintainability


In this case the multiple function approach is the big winner. To see how this is, consider what happens to the single function case when one adds an LED or adds an action. The single function case just explodes in size, whereas with the multi-function approach one simply adds more very simple functions.

Conclusions

If you buy my analysis then clearly the multi-function approach is superior in both efficiency and maintainability - two areas that are dear to my heart. Now granted this is a fairly extreme example. However in my experience if you look through a reasonable amount of code you will soon discover a function that essentially does one thing or another based upon a function parameter. When you locate such a function you might want to try breaking it into two functions in the manner described here - I think you'll be pleased with the results.  Previous Tip

****
As the readership of this blog has grown I must say I have been really impressed with the many insightful comments that have been posted. I know I learn a lot from them, and so I suspect, do a lot of the other readers. Thus for those of you that have commented in the past - thank you. For those of you yet to post a comment, I encourage you to take the plunge!


Home

Bookmark and Share

Tuesday, February 02, 2010

Is GCC a 'good' compiler?

It seems that barely a month goes by when I'm not asked my opinion on compilers. Sometimes I'm simply asked what compilers I use, while other times I'm asked my opinion on specific compilers - with GCC being by far the most asked about compiler. I've resisted writing about this topic because quite frankly it's the sort of topic that people get very passionate about - and by passionate I mean frothing at the mouth passionate. It seems that some folks simply can't accept the fact that someone doesn't agree with them that XYZ is simply the best compiler ever. Notwithstanding this, the volume of inquiries has reached the point where I really feel the need to break my silence.

First of all lets make some general observations.
  1. Despite the fact that I've been doing this for nearly 30 years and also despite the fact that as a consultant I probably use a wider variety of compilers than someone that works for an employer, the simple fact is that I've only had cause to use in anger a limited number of compilers. Thus are Rowley compilers any good? Well their website is decent, the documentation is OK and the IDE very nice. However I've never built a real project with their tools and so I really don't know whether Rowley compilers are any good.
  2. Many vendors provide compilers for many targets. As such it's a good bet that if their 8051 compiler is very good, then their ARM compiler is also likely to be excellent. However it isn't a given. Thus while I whole heartedly endorse the Keil 8051 compiler, I have no opinion on their ARM compiler.
  3. Compilers vary in price from 'free' to 'cheap' to 'expensive' to 'they have got to be joking'. I've put all of these costs in quotes, because as you'll see below, one's perspective on what constitutes 'free' or 'expensive' is not easily defined.
So, enough with the preamble. Lets start with the 'free' and 'cheap' compilers, including GCC. Well for me the bottom line (literally) is that I can't afford to use these compilers. The reason is quite simple. I'm a high priced consultant. I can charge high hourly rates in part because I have exceptionally high productivity. Part of the way I achieve my high productivity is by not wasting my time (and hence my client's money) on stupid issues unrelated to the problem at hand. Given that a compiler / linker is such a frequently used tool, and given that I'm also the sort of engineer who pushes his tools hard, it's absolutely essential to me that when I run into a compiler issue I can pick up the phone and get an intelligent response ASAP. One simply can't do that with 'free' or 'cheap' compilers, and thus too often one is reduced to browsing the Internet to find the solution to a problem. When this happens, then my 'free' compiler rapidly starts to cost an arm and a leg.

What always amazes me about this topic is that so few employers / engineers seem to understand this. It seems that too many folks will eschew paying $2000 for a compiler - and then happily let their engineers bang their heads against a problem for a week - at a rate of at least $1000 a day.

Thus for me, the answer to the question 'Is GCC a good compiler?' is 'no, it isn't'. Of course if you are a student, or indeed anyone who is cash poor and time rich, then by all means use GCC. I'm sure you'll be very pleased with the results and that you'll find it to be a good compiler for you.

What then of the 'expensive' and they 'have got to be joking' categories? Rather interestingly, although based on limited experience, I've found that the very expensive compiler vendors ($10K+) also have lousy support. Instead it's the 'expensive' vendors that actually seem to offer the best combination of functionality, code quality, support and price - and it's this category that I tend to use the most.

Finally, regarding which compiler vendor I use. I happen to be a fan of IAR compilers. I've always found their code quality to be at least 'good'. Their linker is probably the easiest and most powerful  linker I've ever used. Their support is very good (thanks Steve :-)). Finally their IDE is easy to use and has a very consistent look and feel across a wide range of processors, which is important to me as I tend to switch between architectures a lot.


Home

Bookmark and Share

Monday, February 01, 2010

Goto heresy

Today's post is prompted by an email I received from Michael Burns. With his permission I have reproduced his email below.

Hi Nigel,

What is your opinion on the usage of goto in C?

Sometimes when a routine has many conditions [usually for error handling] I have used a do {..} while(0); loop with breaks thus avoiding both deep nesting and repeated checks with a status variable.

For example:

unsigned int XXX_ExampleRoutine (unsigned int XXX_instance, unsigned int *XXX_handle)
{
unsigned int status;

do
{
  if (!XXX_IsValidXXXInstance (XXX_instance))
  {
   status = XXX_INVALID_INSTANCE;
   break;
  }

  if (XXX_handle == NULL)
  {
   status = XXX_INVALID_ARGUMENT;
   break;
  }

  status = XXX_AddRequest (XXX_instance, XXX_handle);
  if (status != XXX_STATUS_OK)
  {
   break;
  }

  etc

  } while (0);

  return status;
}

 But perhaps the do {..} while(0); loop is just an excuse not to use goto?
My (slightly edited) response to him was as follows:

I rarely use a goto statement. While I dislike them for their potential abuse (for example a number of years ago I looked at a Flash driver from AMD that was absolutely littered with them), I also think they have their place. Furthermore I think folks that scream ‘the goto statement is banned’ and then happily allow the use of ‘break’ and ‘continue’ are deluding themselves.

Turning to your example code. As you have pointed out, coding this without using ‘break’ or ‘goto’ can rapidly lead to code that is a nightmare to follow. Indeed once one gets beyond about four or five tests of the type you are performing, I’d say that the code becomes impossible to follow unless you use either the style you have espoused or a goto statement. I’d also make the case that in this situation a goto is actually better. To illustrate my point, I have modified your code slightly, in much the way someone might who wasn’t paying attention:

unsigned int XXX_ExampleRoutine (unsigned int XXX_instance, unsigned int *XXX_handle)
{
  unsigned int status;

  do
  {
   if (!XXX_IsValidXXXInstance (XXX_instance))
   {
    status = XXX_INVALID_INSTANCE;
    break;
   }

   if (XXX_handle == NULL)
   {
    status = XXX_INVALID_ARGUMENT;
    break;
   }

   do
   {
     status = XXX_AddRequest (XXX_instance, XXX_handle);
     if (status == XXX_STATUS_BAD)
     {
      break;
     }
   } while (status == XXX_SOME_STATUS);

   etc
  } while (0);

  return status;
}


In this case, the break wouldn’t work as desired, whereas if you had coded it with a goto, the code would still work as intended.

I guess the bottom line for me is that K&R put a goto statement into the language for a reason (while leaving out lots of other features). Like just about everything else in C, the goto statement can be abused – but when applied intelligently it has its place.
In his reply Michael commented that MISRA doesn't allow goto or continue, and limits break statements to loop termination. I've already posted my comments on MISRA compliance - and I think my position here is consistent with what I wrote then - which in a nutshell is this. MISRA compliance is all well and good, but when it prevents you from implementing something in the most robust manner, then I think it's incumbent upon one as a professional to do what is best rather than what has been mandated by a committee. If that includes using a goto, then so be it.

Home

Bookmark and Share

Sunday, January 24, 2010

Voltage gradients in embedded systems

Todays' post was prompted by an excellent comment from Phil Ouellette in a recent newsletter from Jack Ganssle. In a nutshell Phil was advocating strobing switches with an alternating voltage waveform, rather than a direct voltage in order to minimize corrosion and premature switch failure. This happens to be an area in which I have some experience and so I thought I'd extend the concept a little bit and also give you some food for thought.

The basic idea behind Phil Ouellette's comments is that if one has a bias voltage between two pins (such as between switch contacts), and if this bias is always in one direction (i.e. DC), then the bias can act so as to drive an electrochemical reaction. The exact results of this electrochemical reaction vary (corrosion, dendrite growth etc.), but the net result is normally the same - namely an unwanted short circuit between two pins.

These problems arise particularly on products that have to operate in humid environments and / or products that have to spend a very long time under power over their expected operational life.

So what can be done about this? Well the most important thing to understand is that it is voltage gradient (volts per meter) that is the driving force in this problem and that furthermore, voltage gradient is a vector quantity and thus its direction is important. With this understanding, it should be clear that to minimize these types of problems, one has to minimize the integral of the voltage gradient over time. To do this one has three basic choices:
  1. Minimize the voltage
  2. Maximize the separation
  3. Modulate the voltage
 Let's take a look at each of these in turn:

Minimize the voltage
Clearly technology is progressing in the right direction for us here, as 5V systems are rapidly becoming extinct and 1.8V systems becoming more common place. Thus all other things being equal, the voltage gradient between any two pins on a 5V system will be 2.8 times greater than on a 1.8V system. Thus if you are designing a system where corrosion is a concern you will do yourself a big favor for opting for as low a voltage system as you can. However, see the caveat below.

Note also that given that what counts is minimizing the voltage over time, it follows that you can normally improve the system performance by powering down systems that are not needed at any given time. This also of course saves you power and thus is a highly recommended step.

Maximize the separation
This is by far and away the toughest problem. Twenty years ago, ICs ran on 5V and had 0.1 inch lead spacing, giving a maximum voltage gradient between pins of 5 / 0.00254 = 1969 V/m. Today, a typical 1.8V IC has a lead spacing of 0.5 mm giving a maximum voltage gradient of 1.8 / 0.0005 = 3600 V/m. Thus the voltage gradient between pins on a typical IC has gone up - despite the decrease in the operating voltage. Thus if selecting a low voltage part means that you must use a fine lead pitch part, then you are almost certainly shooting yourself in the foot!

Other areas where you can increase separation without too much pain is in the selection of passive components. For example a 1206 resistor has a much lower voltage gradient than an 0402 resistor, and an axial leaded capacitor is usually preferable to a radially leaded device. As a result, when I'm designing systems that have potential corrosion issues I really prefer to use larger components. Of course this can put you into conflict with marketing and production.

Modulate the voltage
The method suggested by Phil Ouellette is reasonably straightforward for something like a switch. However, if one generalizes the problem to all the components on the board, then it becomes a much more complex problem. For example, consider an address bus to an external memory. The most significant address lines will presumably change state at a much lower frequency than the low order address lines. Indeed it is not uncommon for a system that has booted up and is running normally to be in a situation where the top two or three address lines never change state. Now if they are all at the same state (for example high), then no voltage gradient exists between them and so there is no problem. However if the lines are say High - Low - High, then the voltage gradient is as bad as it gets - and you have a potential problem. There are of course various solutions to this particular problem. The easiest solution is to ensure that the most significant address lines are routed to non contiguous pins on the memory chip (sometimes known as address scrambling) so that high frequency address lines are adjacent to low frequency lines. A much more difficult problem is to link the application so that all address lines are guaranteed to toggle at a reasonable frequency...

Another interesting example comes when one performs microcontroller port pin assignment. Normally one has little choice about certain pin assignments - but for the remainder one has free rein. The next time you find yourself in this position you may want to try performing the assignment so as to minimize the voltage gradients. I think you will find it to be a very challenging task.

Anyway, I hope this has given you some food for thought. Please let us know via the comments section if you have faced any of these types of problems and how you solved them.

Home

Bookmark and Share

Monday, January 11, 2010

A tutorial on lookup tables in C

A while back I wrote a blog posting on using lookup tables as a means of writing efficient C. Since then, every day someone looking for basic information on lookup tables ends up on this blog - and I suspect goes away empty handed. To help make their visits a bit more fruitful I thought I'd offer some basic information on how best to implement look up tables in C. Given that this blog is about embedded systems, my answers are of course embedded systems centric.

So what is a lookup table? Well a lookup table is simply an initialized array that contains precalculated information. They are typically used to avoid performing complex (and hence time consuming) calculations. For example, it is well known that the speed of CRC calculations may be significantly increased by use of a lookup table. A suitable lookup table for computing the CRC used in SMBUS calculations is shown below. (Note that the SMBUS consortium refers to their CRC as a PEC)

uint8_t pec_Update(uint8_t pec)
{
static const __flash uint8_t lookup[256] =
{
0x00U, 0x07U, 0x0EU, 0x09U, 0x1CU, 0x1BU, 0x12U, 0x15U,
0x38U, 0x3FU, 0x36U, 0x31U, 0x24U, 0x23U, 0x2AU, 0x2DU,
0x70U, 0x77U, 0x7EU, 0x79U, 0x6CU, 0x6BU, 0x62U, 0x65U,
0x48U, 0x4FU, 0x46U, 0x41U, 0x54U, 0x53U, 0x5AU, 0x5DU,
0xE0U, 0xE7U, 0xEEU, 0xE9U, 0xFCU, 0xFBU, 0xF2U, 0xF5U,
0xD8U, 0xDFU, 0xD6U, 0xD1U, 0xC4U, 0xC3U, 0xCAU, 0xCDU,
0x90U, 0x97U, 0x9EU, 0x99U, 0x8CU, 0x8BU, 0x82U, 0x85U,
0xA8U, 0xAFU, 0xA6U, 0xA1U, 0xB4U, 0xB3U, 0xBAU, 0xBDU,
0xC7U, 0xC0U, 0xC9U, 0xCEU, 0xDBU, 0xDCU, 0xD5U, 0xD2U,
0xFFU, 0xF8U, 0xF1U, 0xF6U, 0xE3U, 0xE4U, 0xEDU, 0xEAU,
0xB7U, 0xB0U, 0xB9U, 0xBEU, 0xABU, 0xACU, 0xA5U, 0xA2U,
0x8FU, 0x88U, 0x81U, 0x86U, 0x93U, 0x94U, 0x9DU, 0x9AU,
0x27U, 0x20U, 0x29U, 0x2EU, 0x3BU, 0x3CU, 0x35U, 0x32U,
0x1FU, 0x18U, 0x11U, 0x16U, 0x03U, 0x04U, 0x0DU, 0x0AU,
0x57U, 0x50U, 0x59U, 0x5EU, 0x4BU, 0x4CU, 0x45U, 0x42U,
0x6FU, 0x68U, 0x61U, 0x66U, 0x73U, 0x74U, 0x7DU, 0x7AU,
0x89U, 0x8EU, 0x87U, 0x80U, 0x95U, 0x92U, 0x9BU, 0x9CU,
0xB1U, 0xB6U, 0xBFU, 0xB8U, 0xADU, 0xAAU, 0xA3U, 0xA4U,
0xF9U, 0xFEU, 0xF7U, 0xF0U, 0xE5U, 0xE2U, 0xEBU, 0xECU,
0xC1U, 0xC6U, 0xCFU, 0xC8U, 0xDDU, 0xDAU, 0xD3U, 0xD4U,
0x69U, 0x6EU, 0x67U, 0x60U, 0x75U, 0x72U, 0x7BU, 0x7CU,
0x51U, 0x56U, 0x5FU, 0x58U, 0x4DU, 0x4AU, 0x43U, 0x44U,
0x19U, 0x1EU, 0x17U, 0x10U, 0x05U, 0x02U, 0x0BU, 0x0CU,
0x21U, 0x26U, 0x2FU, 0x28U, 0x3DU, 0x3AU, 0x33U, 0x34U,
0x4EU, 0x49U, 0x40U, 0x47U, 0x52U, 0x55U, 0x5CU, 0x5BU,
0x76U, 0x71U, 0x78U, 0x7FU, 0x6AU, 0x6DU, 0x64U, 0x63U,
0x3EU, 0x39U, 0x30U, 0x37U, 0x22U, 0x25U, 0x2CU, 0x2BU,
0x06U, 0x01U, 0x08U, 0x0FU, 0x1AU, 0x1DU, 0x14U, 0x13U,
0xAEU, 0xA9U, 0xA0U, 0xA7U, 0xB2U, 0xB5U, 0xBCU, 0xBBU,
0x96U, 0x91U, 0x98U, 0x9FU, 0x8AU, 0x8DU, 0x84U, 0x83U,
0xDEU, 0xD9U, 0xD0U, 0xD7U, 0xC2U, 0xC5U, 0xCCU, 0xCBU,
0xE6U, 0xE1U, 0xE8U, 0xEFU, 0xFAU, 0xFDU, 0xF4U, 0xF3U
};

pec = lookup[pec];
return pec;
}

There are several things to note about this declaration.

The use of static
If static was omitted, then this table would be allocated and initialized on the stack every time the function is called. This is very slow (and hence self defeating) and will most likely lead to a stack overflow on smaller systems. As a result, a lookup table that is not declared static is almost certainly a mistake. The only exception that I am aware of to this rule is when the lookup table must be used by multiple modules- and hence must be declared so as to have global scope.

The use of const
By definition a lookup table is used to read data. As a result, writing to a lookup table is almost always a mistake. (There are exceptions, but you really need to know what you are doing if you are dynamically altering lookup tables). Thus to help catch unintended writes to a lookup table, one should always declare the array as const.

Note that sometimes this superfluous if the array is forced into Flash, as described below.

The use of __flash
If one provides no memory modifier (such as __flash) then many embedded systems compilers will copy the array into RAM (even though it is declared as const). Given that RAM is normally a much more precious resource than Flash, then this is a very bad thing. As a result, one should give a memory specifier such as __flash to force the array to be kept in Flash. Note that the syntax for doing so varies by compiler vendor. __flash is an IAR extension. I've also seen CODE (Keil) and ROM (Microchip) among others.

The use of a size specific data type such as uint8_t
Almost be definition lookup tables can consume a lot of space. As a result it is very important that you be aware of exactly how much space is being consumed. The best way to do this is to use the C99 data types so that you know for sure what the underlying storage unit is. As a result, if your data type is 'int' then I'd suggest that you are doing yourself a disservice.

Avoidance of incomplete array declarations
You should also note I have explicitly declared the array size as 256. I could of course have omitted this and had the declaration read as static const __flash uint8_t lookup[] = { ...};
However, I strongly recommend that you do not do this with lookup tables, as this is your first line of defense against inadvertently declaring the table with the wrong number of initializers.

Range Checking
In this case, range checking of the array indexer is unnecessary as it is an 8-bit entity and the table is 256 bytes. Thus by definition it is not possible to index beyond the end of the array. However, in general one should always range check the indexer before performing the lookup. If you make your index variable unsigned then you can make the check one-sided which aids in keeping the computation speed high. For example:

#define TABLE_SIZE (27u)

uint8_t lookup(uint8_t index)
{
uint16_t value;

static const __flash uint16_t lookup[TABLE_SIZE] =
{
946u, 2786u, ... 89u
};

if (index < TABLE_SIZE)
{
value = lookup[index];
}
else
{
//Handle error
}

...
}

Examples
So where do I use lookup tables? I've already mentioned CRC calculations as a common application. Probably my most common usage is for implementing jump tables. I wrote an extremely detailed article about this which I recommend you read if this is your interest. The third area where I often implement lookup tables is when I need to know the value of some complex function, where the independent variable has a limited set of values. To put this into plain English. If I have a typical 8 or 10 bit analog - digital - converter (ADC) and I need to compute say 6.5 * ln(X) where X is the ADC reading, then I'll often just declare a lookup table that contains the values of 6.5 * ln(X) for all possible X (0 - 255 in the case of an 8 bit ADC). In this case all I need do is index the lookup table with the value of the ADC and I have my result. (The really observant reader will have noticed that 0 is an invalid input to the ln() function and so my previous statement is not entirely correct. Although this can be handled in several ways including range checking or the use of NaN (Not a Number),  I mention it so as to point out that lookup tables do not absolve you of taking care of corner conditions).

Once you get the hang of using lookup tables, particularly if you embrace the idea of very large lookup tables, then you'll quickly begin to wonder how you ever got along without them.

I'm sure that visitors to this blog would also appreciate hearing about other real world examples of the use of lookup tables - so feel free to tell the world about your experiences in the comments section.

Home

Bookmark and Share

Thursday, December 31, 2009

The best search terms of 2009

One of the interesting things about writing a blog is looking at the search terms that result in people visiting the blog. The vast majority of the search terms are quite reasonable. However, every once in a while a term pops up that brings a wry smile to my face. With that being said, I thought I'd share with you the 'best' search terms of the year. The terms appear below, together with my take on them...

Stay away from embedded systems
What can I say? This just conjured up images of someone's mother telling them that going into embedded systems would be the ruin of them. It certainly was for me.

Crazy enough to use unsigned
I guess I must be a raving lunatic then.

Clueless consultant
I winced when I saw this. I'll just note for the record that at least it wasn't paired with my name!

Personality as it relates to programming eprom
Huh?

Why is c so complicated?
A profound question indeed. So many responses came to mind, but at the end of the day none seemed adequate...

Should I correct grammer and spelling on my blog comments?
Not withstanding the irony that 'grammar' is misspelled I found this to be an unintentionally revealing insight into the minds of those that blog!


With that I will say goodbye to 2009 and welcome to 2010. I hope 2010 is a better year for the industry as a whole and for my readers in particular. I'll be back to my 'regular' topics with my next posting. As always, thanks for reading.

Home

Bookmark and Share

Wednesday, December 30, 2009

Terrorist engineers

From time to time I comment on things related to engineers (as opposed to engineering). This is one of those times!

Anyway as you may know, someone tried to blow an airliner out of the sky the other day. What you may not know is that once again the perpetrator was an engineer. I say 'once again' because as this opinion piece in the New Scientist discusses, engineers are distressingly common in terrorist groups. Anyway, I suggest you read the article as it addresses some of the 'obvious' reasons, while suggesting something insightful about engineers as a group. I also suggest you read the comments as many of them are very thought provoking.

My next posting will be a little more cheerful.

Bookmark and Share

Thursday, December 24, 2009

Hardware costs versus development costs

Earlier this year I posted about how to solve the problem of PIC stack overflow. As part of that article I asked the question as to why does anybody use a PIC anyway when there are superior architectures such as the AVR available? Well, various people have linked to the posting and so I get a regular stream of visitors to it, some of whom weigh in on the topic. The other day, one visitor offered as a reason for using the PIC the fact that they are cheap. Is this the case I asked myself? So I did some rough comparisons of the AVR & 8 bit PIC processors - and sure enough PIC processors are cheaper to buy. For example comparing the ATmega168P vs the PIC16F1936 (two roughly equivalent processors - the AVR has more memory and a lot more throughput, the PIC has more peripherals) I found that Digikey was selling the AVR for 2.60 per 980 and the PIC for $1.66 per 1600. A considerable difference - or is it?

Most of the products I work on have sales volumes of 1000 pieces a year, with a product life of 5 years. Thus if I chose the AVR for such a product, then my client would be paying approximately $1000 a year more for say 5 years. Applying a very modest 5% discount rate, this translates to a Net Present Value of $4,329.

This is when it gets interesting. Does the AVR's architecture allow a programmer to be more productive? Well, clearly this is a somewhat subjective manner. However my sense is that the AVR does indeed allow greater programmer productivity. The reasons are as follows:
  1. The AVR's architecture lends itself by design to being coded in a HLL. The PIC does not. As a result, programming the PIC in a HLL is always a challenge and one is far more likely to have to resort to assembly language - with the attendant drop in productivity.
  2. The AVR's inherent higher processing speed means that one has to be less concerned with fine tuning code in order to get the job done. Fine tuning code can be very time consuming.
  3. The AVR's greater code density means that one is less likely to be concerned with making the application fit in the available memory (something that can be extremely time consuming when things gets tight).
  4. The AVR's superior interrupt structure means that interrupt handling is far easier. Again interrupts are something that can consume inordinate amounts of time when things aren't working.
Now if one is a skilled PIC programmer and a novice on the AVR, then your experience will probably offset what I have postulated are inherent inefficiencies in the PIC architecture. However what about someone such as myself who is equally comfortable in both arenas? In this case the question becomes - how many more days will it take to code up the PIC versus the AVR and what do those days cost?

Of course if you are a salaried employee, then your salary is 'hidden'. However when you are a consultant the cost of extra days is clearly visible to the client. In this case, if using an AVR reduces the number of development days by even a few, the cost difference between the two processors rapidly dwindles to nothing. Thus the bottom line is - I'm not sure that PICs are cheaper. However I suspect that in many organizations what counts is the BOM cost of a product - and perhaps this finally explains why PIC's are more popular than AVR's.

As always, I'm interested in your thoughts.
Home

Bookmark and Share

Monday, December 21, 2009

Automated kiosks

I'm not prone to rants on this blog, but I think it's time to vent about automated kiosks. Automated kiosks are popping up everywhere - airline check in, the movies, grocery stores and so on. While it's true that as a consumer I can't say I particularly like these beasts, I think it's the engineer in me that's really ticked off. Why is this you ask? Well I have several beefs with them.

Processing Speed
I am constantly amazed at how amazingly slow most of these kiosks are. It's ridiculous the number of times I've stood in front of a kiosk while it tells me 'System Processing'. What exactly does 'System Processing' mean - and how in the age of Gigahertz processors do I find myself having to wait for a computer to do something as pedestrian as process a credit card payment?

Lack of Parallel Processing
Why is it that these terminals do tasks sequentially that could (and should) be done in parallel? For example, at my local grocery store coupons are only printed after payment has been accepted. Why aren't they printed on the fly?

Availability
The up time of these kiosks seems to be amazingly bad. My local cinema has three kiosks. I have never seen all three working at the same time. The availability of check-in kiosks at airports can be even worse.

So what to make of this? Well I think the reasons for these problems becomes apparent if one notices that these kiosks all have a common hardware platform (a CPU with a color flat panel display, a touch screen, a card reader and a printer) and all are trying to solve a common problem (provide an easy to use interface to a big database). I don't know for sure, but I think it's highly likely that most of these kiosks are running a Windows X86 platform and that they are programmed in VC++ by folks who do VC++ PC programming. In short they are computers and not embedded systems, and as such are programmed using the usual PC mindset. No wonder they are so bad!

Before I leave this topic, I'll mention that there is one class of kiosk that for the most part doesn't have the aforementioned problems - and that is Automated Teller Machines (ATMs). While I can find the user interface on ATMs quite maddening at times, I've never gone to use an ATM and found myself staring at the Windows logo. My suspicion is that when it comes to ATMs, banks worked out a long time ago that they needed robust systems with high availability and thus they they went the embedded systems approach as opposed to the PC approach. Without a doubt this is a much more expensive - but boy can I tell the difference!

Update
I went Christmas shopping today. I went to pay for car parking only to be faced with a kiosk stating 'Printing ticket' and a piece of paper stuck to the kiosk saying 'Out of order'. I tracked down another kiosk and successfully paid the parking fee. When I went to exit the car park, the card reader could not read the ticket. A (human) attendant had to manually let me out...

Bookmark and Share

Friday, December 18, 2009

Century Post

My blogging software tells me that this is the one hundredth posting to this blog. I have to admit I'm somewhat astonished to find that not only have I found so many things to write about, but that the list of topics seems to be growing rather than shrinking. Anyway, rather than post on my (un)usual topics, I thought I'd mark the occasion by letting regular readers know what has been going on and where stack-overflow is going in the year ahead.

First off, I suspect that many of you have noticed that my blog posting rate has dropped in the last month or so. This has been primarily because as part of running my consulting business, I have been working on a major website overhaul which has consumed a lot of my time, but which has just been completed. If you want to see some of things I get involved in, then do please take a look. I'd appreciate any feedback you may have about the site. (It's unclear if the hosting change has completely propagated through the DNS hierarchy. If you find yourself looking at a mostly blue website, then you are on the new site).

I'd also like to mention some upcoming changes to EmbeddedGurus in general, and this blog in particular. EmbeddedGurus is becoming a popular place and so a redesign is in the works to reflect its increased stature . I suspect that Michael Barr will have more to say on this topic in the near future. With regard to stack-overflow, it's clear to me that the blog has outgrown its current template. As such, as part of the EmbeddedGurus overhaul I'm hoping to switch to a better template which is not only more suited to my posts, but which will allow visitors to more easily find previous posts, as well as allow them to search the blog.

Finally, to those of you that answered my request for reader feedback a few months back. You have not been forgotten! I hope to incorporate as many of your suggestions as possible in 2010.

As always, thanks for reading.

Home

Bookmark and Share

Friday, December 04, 2009

Effective C Tip #8 - Structure Comparison

This is the eighth in a series of tips on writing effective C. Today's topic concerns how best to compare two structures (naturally of the same type) for equality. The conventional wisdom is that the only safe way to do this is by explicit member by member comparison, and that one should avoid doing a straight bit (actually byte) comparison using memcmp() because of the problem of padding bytes.

To see why this argument is advanced, one must understand that a compiler is free to place pad bytes between members of a structure so as produce more favorable alignment of the data in memory. Furthermore, the compiler is not obligated to initialize these pad bytes to any particular value. This code fragment illustrates the problem:
typedef struct
{
 uint8_t x;
 uint8_t pad1;  /* Compiler added padding */
 uint8_t y;
 uint8_t pad2;  /* Compiler added padding */

} COORD;

void foo(void)
{
 COORD p1 p2;
 p1.x = p2.x = 3;
 p1.y = p2.y = 4;
 /* Note pad bytes are not initialized */
 if (memcmp(&p1, &p2, sizeof(p1)) != 0)
 {
   /* We may get here */
 }
 ...
}


Thus, it's clear that to avoid these kinds of problems, we must do a member by member comparison. However, before you rush off and start writing these member by member comparison functions, you need to be aware of a gigantic weakness with this approach. To see what I mean, consider the comparison function for my COORD structure. A reasonable implementation might look like this:

bool are_equal(COORD *p1, COORD *p2)
{
  return ((p1->x == p2->x) && (p1->y == p2->y));
}

void foo(void)
{

 COORD p1 p2;
 p1.x = p2.x = 3;
 p1.y = p2.y = 4;

 if (!are_equal(&p1, &p2))
 {
   /* We should never get here */
 }
 ...
}


Now consider what happens if I add a third member z to the COORD structure. My structure definition and function foo() become:
typedef struct
{
 uint8_t x;
 uint8_t pad1;  /* Compiler added padding */
 uint8_t y;
 uint8_t pad2;  /* Compiler added padding */
 uint8_t z;
 uint8_t pad3;  /* Compiler added padding */
} COORD;

void foo(void)
{

 COORD p1 p2;
 p1.x = p2.x = 3;
 p1.y = p2.y = 4;
 p1.z = 6;
 p2.z = 5;


 if (!are_equal(&p1, &p2)
 {
   /* We will not get here */
 }
 ...
}

The problem is that I now have to remember to also update the comparison function. Now clearly in a simple case like this, it isn't a big deal. However, in the real world where you might have a 500 line file, with the comparison function buried miles away from the structure declaration, it is way too easy to forget to update the comparison function. The compiler is of no help. Furthermore it's my experience that all too often these sorts of problems can exist for a long time before they are caught. Thus the bottom line, is that member by member comparison has its own set of problems.

So what do I suggest? Well, I think the following is a reasonable approach:
  1. If there is no way that your structure can change (presumably because of outside constraints such as hardware), then use a member by member comparison.
  2. If you are working on a system where structure members are aligned on byte boundaries (which is true to the best of my knowledge for all 8 bit processors, and also most 16 bit processors), then use memcmp(). However, you need to think about doing this very carefully if there is the possibility of the code being ported to a platform where alignment is not on an 8 bit boundary.
  3. If you are working on a system that aligns on a non 8 bit boundary, then you must either use member by member comparison, or take steps to ensure that all the bytes of a structure are initialized using memset() before you start assigning values to the actual members. If you do this, then you can probably use memcmp() with a reasonable amount of confidence.
  4. If speed is a priority, then clearly memcmp() is the way to go. Just make sure you aren't going to fall into a pothole as you blaze down the road.
Before I leave this topic, I should mention a few esoteric things for you to consider.

If you use the memcmp() approach you are checking for bit equality rather than value equality. Now most of the time they are the same. Sometimes however, they are not. To illustrate this, consider a structure that contains a parameter that is a boolean. If in one structure the parameter has a value of 1, and in the other structure it has a value of 2, then clearly they differ at the bit level, but are essentially the same at a value level. What should you do in this case? Well clearly it's implementation dependent. It does however illustrate the perils of structure comparison.

Finally I should mention issues associated with structures that contain pointers. CS guys like to distinguish between deep and shallow structure comparison. I rarely write code where a deep comparison is required, and so for me it's mostly a non-issue.

Previous Tip
Home

Bookmark and Share

Thursday, November 26, 2009

Keeping your EEPROM data valid through firmware updates

Back when embedded systems used EPROM (no that is not a typo for my younger readers) rather than Flash, the likelihood of the code being updated in the field was close to nil. Today however, it is common for embedded systems to contain mechanisms to allow the code to be updated easily. Like most people, I embraced this feature enthusiastically. However, after I'd implemented a few systems that were field upgradable, I discovered that the ability to update in the field had an unexpected impact on my EEPROM data. To see what I mean, read on...

Most of the embedded systems I work on contain EEPROM. One of the prime uses for this EEPROM is for storing configuration / calibration information for the system. As a result, I often store data in EEPROM as a series of data structures at fixed locations, with gaps in between them. Thus, my EEPROM map might look something like this:

#define CAL_DATA_LOCATION     0x0010
#define CONFIG_DATA_LOCATION    0x0200
...
#define SYSTEM_PARAMS_LOCATION    0x1000

typedef struct
{
    uint32_t param1;
    uint16_t param2;
    ...
    uint8_t     spare[10];
} CALIBRATION_DATA;

__eeprom CALIBRATION_DATA Cal_Data @ CAL_DATA_LOCATION;
__eeprom CONFIGURATION_DATA Config_Data @ CONFIG_DATA_LOCATION;
...
__eeprom SYSTEM_DATA System_Data @ SYSTEM_PARAMS_LOCATION;

As you can see, I was smart enough to allow room for growth within the structure via the spare[] array. (I have intentionally omitted support related to corruption detection to avoid complicating the issue at hand). As a result I thought I was all set if at some time a SW update caused me to have to use more parameters in a given EEPROM structure. Well I went along in this blissful state of ignorance for a few years until the real world intruded in a rather ugly way. Here's what happened. The firmware upgrade didn't require me to add any new parameters to the EEPROM, per se, but it did require that the data type of some of the parameters be changed. For example, my CALIBRATION_DATA structure example might have to change to this:

typedef struct
{
    float     param1;
    uint16_t param2;
    ...
    uint8_t     spare[10];
} CALIBRATION_DATA;

Thus param1 has changed from a uint32_t type to a float. Thus when the new code powered up, it had to read param1 as a uint32_t, and then convert it to a float type and write it back to the EEPROM. This clearly was quite straightforward. However, where the problem came was the next time the system powered up. I realized that without some sort of logic in place, I would re-read param1, treat it as a uint32_t (even though it is a float), 'convert' it to a float and write it back to EEPROM. Clearly I needed some method of signaling that I had already performed the requisite upgrade. As I pondered this problem, I realized that it was even more complicated. Let us denote the two versions of CALIBRATION_DATA as version 1 and version 2 respectively. Furthermore, let's assume that in version 3 of the code, param1 gets changed to a double (thus shifting all the other parameters down and consuming some of the spare allocation). I.e. it looks like this:

typedef struct
{
    double     param1;
    uint16_t param2;
    ...
    uint8_t     spare[6];
} CALIBRATION_DATA;

In this case, we must not only be able to handle the upgrade from version 2 to version 3 - but also directly from version 1 to version 3. (You could of course require that users perform all upgrades in order. While I recognize that sometimes this is unavoidable, I suspect that most times it's because the developer has backed themselves in to the sort of corner I describe here).

Anyway, with this insight in hand, I realized that I needed a generic system for both tagging an EEPROM structure with the version of software that created it, together with a means of providing arbitrary updates. This is how I do it.

Step 1.
Make the first location of each EEPROM structure a version field. This version field contains the firmware version that created the structure. By making it the first location in the EEPROM data structure, you ensure that you can always read it regardless of what else happens to the structure. Thus my CALIBRATION_DATA structure now looks something like this:

typedef struct
{
    uint16_t version;
    uint32_t param1;
    uint16_t param2;
    ...
    uint8_t     spare[10];
} CALIBRATION_DATA;

Step 2.
Add code to handle the upgrades. This code must be called before any parameters are used from EEPROM. The code looks something like this:

void eeprom_Update(void)
{
    if (Cal_Data.version != SW_VERSION)
    {
        switch (Cal_Data.version)
        {
        case 0x100:
            /* Do necessary steps to perform upgrade */
        break;

        case 0x200:
            /* Do necessary steps to perform upgrade */
        break;

        default:
        break;

        /* Update the EEPROM version number */
        Cal_Data.version = SW_VERSION;

    }
}

Incidentally, I find that is often one of those cases where falling through case statements is really useful. Of course doing this is usually banned and so one ends up with much more clumsy code than would otherwise be required.

An Apology
Regular readers will no doubt have noticed that this is my first post in a while. A deadly combination of vacation and urgent projects with tight deadlines had conspired against me to prevent me blogging at my usual pace.

Bookmark and Share

Friday, November 06, 2009

Eye, Aye I!

Today's post should probably be called 'Thoughts on non-descriptive variable names', but once in a while I have to let my creative side out!

Anyway, the motivation for today's post, is actually Michael Barr's latest blog posting concerning analysis of the source code for a breathalyzer. Since I do expert witness work, as well as develop products I was keen to see what the experts in this case had to say. One snippet from the expert for the plaintiffs caught my eye. In appendix B of their report, Draeger made the following statement concerning general code issues:
Non descriptive variable names – i, j, dummy and temp
This touched upon something where I seem to be at odds with the conventional wisdom. I'll illustrate what I mean. Consider initializing an array to zero (I'll ignore that we could use a library function for this). I would code it like this:

uint8_t buffer[BUFSIZE];
uint8_t i;

for (i = 0; i < BUFSIZE; i++) 


   buffer[i] = 0; 
}

This code would be rejected by many coding standards (and apparently would offend Draeger), as the loop variable 'i' is not descriptive. To be 'correct', I should instead code it like this

uint8_t buffer[BUFSIZE];
uint8_t buffer_index;

for (buffer_index = 0; buffer_index
< BUFSIZE; buffer_index++) 

   buffer[buffer_index] = 0; 
}

So for me, the question is, does the second approach buy me anything - or indeed cost me anything? Well clearly, this is a matter of opinion. However I'd make the following observations:
  1. I think my code is clear, concise and easily understood by even the most unskilled programmer
  2. Is the variable name 'buffer_index' clearer - yes but only to a native English speaker. It's my experience that there are a lot of non-native English speakers in the industry.
  3. Personally, I find the use of similar words in close proximity (buffer[buffer_index]) to be a bit harder to read, and very easy to mis-read if there are other variables around prefixed with buffer.
I'd also make the observation that many coding standards require variable names to be at least 3 characters long, and as a result I've seen code that looks like this:


uint8_t buffer[BUFSIZE];
uint8_t iii;

for (iii = 0; iii < BUFSIZE; iii++) 


   buffer[iii] = 0; 
}

Clearly in this case, the person is addressing the letter of the standard (if you'll pardon the pun), but not the spirit. Where the standard requires the variable names to be meaningful, I've also seen this done:


uint8_t buffer[BUFSIZE];
uint8_t idx;

for (idx = 0; idx < BUFSIZE; idx++) 


   buffer[idx] = 0; 
}

This code meets the letter of the standard, and arguably the spirit. Is it really any more understandable than my original code? I don't think so - but I'll be interested to get your comments.

Bookmark and Share

Monday, November 02, 2009

Lowering power consumption tip #3 - Using Relays

This is the third in a series of tips on lowering power consumption in embedded systems. Today's topic concerns relays. It may be just the markets that I operate in, but relays seem to crop up in a very large percentage of the designs that I work on. If this is true for you, then today's tip should be very helpful in reducing the power consumption of your system.

I'll start by observing that relays consume a lot of power - at least in comparison to silicon based components, and thus anything that can be done to minimize their power consumption typically has a large impact on the overall consumption of the system. That being said, usually the thing that will reduce a relay's power consumption the most is to simply use a latching relay. (A latching relay is designed to maintain its state once power is removed from its coil. Thus it only consumes power when switching - much like a CMOS gate). However, latching relays cannot be used in circumstances where it is important that the relays revert to a known state in the event of a loss of power. Most embedded systems that I work on require the relays to have this property. Thus in these cases, what can be done to minimize the relay's power consumption?

If you look at the data sheet for a relay, you will see a plethora of parameters. However, the one of most interest is the operating current. (Relays are current operated devices. That is it is the presence of current flowing through the relay coil that generates a magnetic field that in turn produces the magneto-magnetic force that moves the relay armature). This current is the current required to actuate (pull-in) the relay. Not much can be done about this. However, once a relay is actuated, the current required to hold the relay in this state is typically anywhere between a third and two thirds less than the pull-in current. This current is called the holding current - and may or may not appear on the data sheet. Despite the fact that the holding current is so much less than the pull-in current, almost every design I see (including many of mine I might add) eschews the power savings that are up for grabs and instead simply puts the pull-in current through the relay the whole time the relay is activated.

So why is this? Well, the answer is that it turns out it isn't trivial to switch from the pull-in current to the holding current. To see what I mean - read on!

The typical hardware to drive a relay consists of a microcontroller port pin connected to gate of an N channel FET (BJT's are used, but if you are interested in reducing power, a FET is the way to go). The FET in turn is connected to the relay coil. Thus to turn the relay on, one need only configure the microcontroller port pin as an output and drive it high - a trivial exercise.

To use the holding current approach, you need to do the following.
  1. Connect the FET to a microcontroller port pin that can generate a PWM waveform. The hardware is otherwise unchanged.
  2. To turn the relay on, drive the port pin high as before.
  3. Delay for the pull in time of the relay. The pull in time is typically of the order of 10 - 100 ms.
  4. Switch the port pin over to a PWM output. The PWM depth of course dictates the effective current through the relay, and this is how you set the holding current. The other important parameter is the PWM frequency. Its period should be at most one tenth of the pull-in time. For example, a relay that has a pull in time of 10 ms, would require a PWM period of no more than 1 ms, giving a PWM frequency of 1 kHz. You can of course use higher frequencies - but then you are burning unnecessary power in charging and discharging the gate of the FET.
  5. To turn the relay off, you must disable the PWM output and then drive the port pin low.
Looking at this, it really doesn't seem too hard. However compared to simply setting and clearing a port pin, it's certainly a lot of work. Given that management doesn't normally award points for reducing the power consumption of an embedded system, but does reward getting the system delivered on time, it's hardly surprising that most systems don't use this technique. Perhaps this post will start a tiny movement towards rectifying this situation.

Previous Tip
Home

Bookmark and Share

Monday, October 26, 2009

Embedded systems boot times

Last week saw the release of Windows 7. Looking over the new features, the one that struck me the most was the effort that Microsoft had put into decreasing the boot time of the OS. If the reports are to be believed, then Windows 7 boots dramatically faster than its predecessors - to which I say about time! Almost contemporaneously with the Windows 7 announcement I took delivery of a beautiful new Tektronix Mixed Signal Oscilloscope. It's a model MSO2024 with four analog channels, 16 digital channels, a huge color display, great user interface, tremendous connectivity etc. Despite all this, I'm disappointed with the product. The reason - it takes 75 seconds to boot. Now if I'm preparing a major debug session, then this 75 seconds isn't terrible. However, most of the time when I turn a scope on, I'm interested in just getting a quick look at a signal - and then I'm done. For this usage mode, the MSO2024 fails miserably.

Now I'd like to think that this scope is an oddball in this respect - but it isn't. I purchased a big fancy flat screen TV last year - and it takes about 5 seconds to boot from standby (i.e. powered, yet 'off') to being 'on'. Maybe it's my type A personality, but I find that time unacceptable (in part because I'm never sure if I've actually turned the thing on, or whether the remote control signal missed its mark).

Now without a doubt, these long boot times are a function of large processors, huge memories, complex RTOS's etc. However, I also think they are equally a result of poor design by the engineers (or maybe poor specification by the marketing department).

Thus the bottom line - think about the boot time of your product. Your end user will appreciate you doing so.

Incidentally, if there is sufficient interest, I may publish some tips on how to minimize boot times in future blog postings.

Bookmark and Share

Tuesday, October 20, 2009

Whither white space?

I was looking over some code I wrote a year ago in preparation for making some minor enhancements, when I noticed that in one place I had two blank lines between functions, instead of my coding standard mandated one line. I immediately and instinctively corrected it - as is my norm. However having done so, I paused to consider what I'd just done - and why.

On the one hand, this was a clear violation of my coding standard - and so it must be corrected. However, the violation wasn't doing any harm, per se, and correcting it came at a cost - namely that someone (probably me) browsing the version control system at a later date will see that the file has been touched - and may choose to investigate what was changed - only to find out that it was a simple white space correction. (I appreciate that version control systems can be set up to ignore white space. I choose to not use that option).

Now I suspect that readers of this blog will be divided. Some will think I was quite right to eliminate the extra line, whereas others are thinking - doesn't this guy have better things to do in life? Which brings me to my point!

Some people are completely anal retentive when it comes to white space. They are very careful on indentation, alignment of comments, use of blank lines and so on. I fall squarely into this category - as does Jean Labrosse of MicroOS II fame. Others could not care less about white space. They will arbitrarily have 6 blank lines between two functions,and then no lines between the next two functions. Their comments are usually aligned all over the place, and they rarely use space between e.g. the elements of a for loop statement. Finally, there's the third (and largest group) who fall somewhere in between these two extremes.

Now I look at a lot of code, and  having done so, I think I can make a sweeping generalization, which I'll call the "Nigel Jones white space principle". Succinctly put, it states:

White space discipline is highly correlated with coding discipline.

That is, those who are careless about white space are often careless about a lot of other things. The converse seems to apply. As a result, when I look at code, literally the first thing I note is how well disciplined was the author in the use of white space. If the code is cleanly and consistently laid out, then I get a good first impression, and the chances are the code will be first rate.

Now I am unsure which is the cause and which is the effect here. In other words, does white space discipline lead to more disciplined code overall, or is it the other way around? Regardless, if your code looks like a mess, then I'd humbly suggest that you literally clean up your act - your career will thank you!

Bookmark and Share

Monday, October 12, 2009

Effective C Tip #7 - Use strongly typed function parameters

This is the seventh in a series of tips on writing effective C. Today's topic concerns function parameters, and more to the point, how you should choose them in order to make your code considerably more resilient to parameter passing errors.  What do I mean by parameter passing errors? Well consider a function that is intended to draw a rectangle on a display. The lousy way to design this function interface would be something like this:
void draw_rect(int x1, int y1, int x2, int y2, int color, int fill)
{
...
}
I must have seen a function like this many times. So what's wrong with this you ask? Well in computer jargon the parameters are too weakly typed. To put it into plain English, it's way too easy to pass a Y ordinate when you are supposed to pass an X ordinate, or indeed to pass a color when you are supposed to be passing an ordinate or a fill pattern. Although in this case (and indeed in most cases) these types of mistakes are clearly discernible at run time, I'm a firm believer in catching as many problems at compile time as possible. So how do I do this? Well there are various things one can do. The most powerful technique is to use considerably more meaningful data types. In this case, I'd do something like this:
typedef struct
{
  int x;
  int y;
} COORDINATE;

typedef enum
{
 Red, Black, Green, Purple .... Yellow
} COLOR;

typedef enum
{
 Solid, Dotted, Dashed .. Morse
} FILL_PATTERN;

void draw_rect(COORDINATE p1, COORDINATE p2, COLOR color, FILL_PATTERN fill)
{
...
}
Now clearly it's highly likely that your compiler will complain if you attempt to pass a coordinate to a color and so on - and thus this is a definite improvement. However, nothing I've done here will prevent the X & Y ordinates being interchanged. Unfortunately, most of the time you are out of luck on this one - except in the case where you are dealing with certain sizes of display panels with resolutions such as 320 * 64, 320 * 128 and so on. In these cases, the X ordinate must be represented by a uint16_t whereas the Y ordinate may be represented by a uint8_t. In which case my COORDINATE data type becomes:

typedef struct
{
  uint16_t x;
  uint8_t y;
} COORDINATE;

This will at least cut down on the incidence of parameters being passed incorrectly.

Although you probably will not get much help from the compiler, you can also often get a degree of protection by declaring appropriate parameters as const. A good example of this is the standard C function memcpy(). If like me, you find yourself wondering if it's memcpy(to, from) or memcpy(from, to), then an examination of the function prototype tells you all you need to know:
void *memcpy(void *s1, const void *s2, size_t n);

That is, the first parameter is simply declared as a void * pointer, whereas the second parameter is declared as void * pointer to const. In short the second parameter points to what we are reading from, and hence memcpy is indeed memcpy(to, from). Now I'm sure that many of you are thinking to yourself - so what, the real solution to this is to give meaningful names to the function prototype. For example:
void *memcpy(void *destination, const void *source, size_t nos_bytes);

Although I agree wholeheartedly with this sentiment, I'll make two observations:
  1. You are assuming that the person reading your code is sufficiently fluent in the language (English in this case) that the names are meaningful to them.
  2. Your idea of a meaningful label may not be shared by others. I've noticed that this is particularly the case with software, as it seems that all too often the ability to write code and the ability to put a meaningful sentence together are inversely correlated.
The final technique that I employ concerns psychology!  Now one can argue that the failure to pass parameters correctly is due to laziness on behalf of the caller. At the end of the day, this is indeed the case. However, I suspect that in many cases, it's not because the caller was lazy, but rather it's because the caller thought they knew what the function parameter ordering is (or should be). A classic example of this of course concerns dates. Being from the UK (or more relevantly - Europe), I grew up thinking of dates as being day / month / year. Here in the USA, they of course use the month / day / year format. Thus when designing a function that needs to be passed the day, month and year, in what order should one declare the parameters? Well in my opinion it's year, month, day. That is the function should look like this:

void foo(int16_t year, MONTH month, uint8_t day)

There are several things to note:
  1. By putting the year first, one causes both Europeans and Americans to think twice. This is where the psychology comes in!
  2. I've made the year signed - because it can indeed be negative, whereas the month and day cannot.
  3. I've made the month a MONTH data type, thus considerably increasing the likelihood that an attempt to pass a day when a month is required will be flagged by the compiler.
  4. I've made the day yet another data type (that maps well on to its expected range). Furthermore, attempts to pass most year values to this parameter will result in a compilation warning.
 Thus I've used a combination of psychology and good coding practice to achieve a more robust function interface.

Thus the bottom line when it comes to designing function interfaces:
  1. Use strongly typed parameters.
  2. Use const where you can.
  3. Don't assume that what is 'natural' to you is 'natural' to everyone.
  4. Do indeed use descriptive parameter names - but don't assume that everyone will understand them.
  5. Apply some pop psychology if necessary.
 I hope you find this useful.
Next Tip
Previous Tip
Home

Bookmark and Share

Wednesday, October 07, 2009

Is MISRA compliance worthwhile?

I had been planning on talking about MISRA C compliance in a month or two from now. However, in the comments section of my recent post about bitfields, Anand posed the following question:
However I am writing to ask your opinion about MISRA C compliance. This is the first time a client company has asked us for MISRA C compliance and I am not quite sure where to start. I started reading the guidelines from the web, however soon realised that its an enormous task and I would never get through all of them. So what I would like to know from you is how realistic is it to expect to comply with all the guidelines? Since my compiled code size is less than 2KB so should we even bother with MISRA C compliance?
Your valuable insights would be highly appreciated
I first gave my thoughts about MISRA C in an article published in 2002 in Embedded Systems Programming magazine (now Embedded Systems Design). Looking back over the article, I don't see anything in it that I disagree with now. However, there are certainly some things that I'd add, having attempted to adhere to the MISRA guidelines in several large projects. After I have given my thoughts, I'll try and address Anand's questions.

I'll start by noting that since I wrote the aforementioned article, MISRA released a second edition of their guidelines in 2004. The second edition was a major revision, and attempted to address many of the ambiguities in the 1998 version. As such, if someone is asking for MISRA compliance, they usually mean the 2004 rules; however it would behoove you to check!

Most of the MISRA rules can be checked via static analysis (i.e. by a compiler like tool) and indeed many more compilers now come with a MISRA checking option. Thus for those of you that are using such a compiler, conformance to most of the rules may be checked by simply using the correct compiler switch. For those of you that don't have such a compiler, a great alternative is my favorite tool - PC-Lint from Gimpel - which brings me nicely to the main point I wish to make. MISRA  attempts to protect you from the darkest, nastiest corners of the C language - which is exactly what PC-Lint attempts to do as well. However, MISRA C attempts to do it by banning various constructs, whereas PC-Lint attempts to detect and inform you when you are using a construct in a potentially dangerous way. To put it another way, MISRA treats you like a child and PC-Lint treats you like an adult. As a result, I'll take code that is 'Lint free' over code that is 'MISRA compliant' any day. I'd also add that making code Lint free is often a lot more challenging than making it MISRA compliant.

Now does this mean that I think the MISRA rules are not worthwhile? Absolutely not! Indeed the vast majority of the rules are pretty much good programming practice in a codified form. For example, rule 2.2. states: "Source code shall only use ISO9899:1990 'C' style comments". Now regardless of whether you agree with this rule or not, it's my opinion that source code that contains C style comments and C++ style comments reflects a lack of discipline on the author's behalf. Thus I like this rule - and I adhere to it.

Where I start to run into problems with MISRA are rules such as 20.6 "The macro offsetof in stddef.h shall not be used". I wrote an article in 2004 for Embedded Systems Programming magazine entitled "Learn a new trick with the offsetof() macro". The examples I give in the article are elegant and robust solutions to certain common classes of problems in embedded systems. Solving these problems without using the offsetof() macro is hard and /or tedious and / or dangerous. In short the medicine prescribed by MISRA is worse than the supposed disease.

Putting this all together, my feelings on MISRA are as follows.
  1. It's intentions are excellent - and I wholeheartedly support them.
  2. Most of its rules are really good.
  3. There are times when I just have to say - sorry your attempts to make my code 'safer' are actually having the opposite effect and so as an experienced embedded systems engineer I'm choosing to ignore the rule in the interest of making a safer product. Note that I don't do this on a whim!
Now clearly, when it comes to my third point, I can certainly be accused of hubris. However, at the end of the day my clients typically hire me for my experience / knowledge and not for my ability to follow a rule book.

As a final note, I must say that I think the MISRA committee has overall done a very fine job. Trying to come up with a set of rules for the vastly disparate embedded systems industry (I know they are only really aimed at the car industry) is essentially an impossible task.

So with that all of the above as a preamble, I think I can address Anand's questions.

Where to Start?
Order a copy of the guidelines from MISRA. You can get an electronic copy for £10 (about $15). If you are using a C compiler that has a MISRA checking switch, then turn it on. If not buy a copy of PC-Lint. If you are not already using PC-Lint then you should be. It will be the best $389 you ever spent.

Then What?
Take a snapshot of your code base in your version control system before starting.
The first time you check your code for compliance, you will undoubtedly get an enormous number of errors. However, I think you will find that half a dozen rules are creating 90% of the problems. Thus you should be able to knock the error count down fairly quickly to something manageable. At that point you will be in to some of the tougher problems. My recommendation is not to blow off the MISRA errors, but rather to understand why MISRA thinks your constructs are unsafe. Only once you are convinced that your particular instance is indeed safe should you choose to ignore the violation.

Is it worth it for 2K of object code?
Yes - and no. With an executable image of 2K, the chances are most of your code is very tightly tied to the hardware. In my experience, the closer you get to the hardware, the harder it is to achieve MISRA compliance, simply because interaction with the hardware typically relies upon extensions to standard C - and these extensions violate MISRA rule 1.1. Thus you have no hope of making your code compliant, literally starting with the first rule. (The MISRA committee aren't stupid, and so they have a formal method for allowing you to waive certain rules - and this is a clear example of why it's necessary. However, it's a tough way to start out). Does this mean that the exercise is pointless though? Probably not, as at the end of the day you'll probably have cleaner, more portable, more easily maintained code. However, I seriously doubt that it will be compliant.

Finally, I'll answer a question that you didn't ask. What should I say to a client that asks for MISRA compliance? Well the first thing to determine is whether compliance is desired or required. If it's the former, then what they are probably asking for is work that conforms to industry best practices. In which case what I normally do is explain to them that the code I deliver will be Lint free - and that this is a far higher hurdle to cross than MISRA compliance. If however MISRA compliance is a must, then you have no option other than to bite the bullet and get to work. It would probably make sense to retain a consultant for a day or two to help you get up to speed quickly.

Bookmark and Share

A taxonomy of bug types in embedded systems

Over the next few months I'll be touching upon the subjects of debugging and testing embedded systems. Although much has been written about these topics (often by companies looking to sell you something), I've always been struck by the fact that many of these discussions treat errors as if they were all cut from the same cloth. Clearly this is foolhardy, as it's my experience that understanding what class of error you have is key to adopting an effective debugging and testing strategy. With that being said, my taxonomy of embedded systems errors appears below, arranged roughly in the order that one encounters them in an embedded project. I might also add that the difficulty in solving these problems also roughly follows the order I've listed, with syntax errors being trivial to identify and fix, while race conditions can be extremely difficult to identify (even if the fix is fairly easy).

Group 1 - Building a linked image
Syntax errors
Language errors
Build environment problems (make file dependencies, linker configurations)

Group 2 - Getting the board up and running
Hardware configuration errors (failure to setup peripherals correctly)
Run time environment errors (stack & heap allocation, memory models etc)
Software configuration errors (failure to use library code correctly)

Group 3 - Knocking off the obvious mistakes
Coding errors (initialization, pointer dereferencing, N + 1 issues etc)
Algorithmic errors.

Group 4 - Background / Foreground issues
Re-entrancy
Atomicity
Interrupt response times

Group 5 - Timing related
Resource allocation mistakes
Priority / scheduling issues
Deadlocks
Priority inversion
Race conditions

It's my intention over the next few months to discuss how I set about solving these sorts of problems, so it's important that I've got the groups right. Thus if anyone thinks this taxonomy is missing an important group, then perhaps you could let me know via the comments section or email.

Bookmark and Share

Saturday, October 03, 2009

Consulting as a leading economic indicator - update #1

At the beginning of September in the wake of the dismal jobs report for EE's posted by the IEEE, I wrote an article postulating that consulting is a leading economic indicator for our industry. I also promised an update around the end of September.

The bottom line - it's still very quiet. I've asked some fellow consultants their opinion on this issue. The response has been very guarded optimism, in that they are seeing an uptick in interest, even if it isn't directly being translated into a lot of work yet. So for those of you out there looking for gainful employment, I'm afraid I really don't have any good news to report. The best I can give you is that the bad news hasn't got worse.

Changing topics, if you have not read Mike Barr's recent posting on binary literals, then I strongly recommend that you do so. It would have fitted very nicely into my series on effective C tips - so if you find my effective C tips series useful, then go take a look.

Home

Bookmark and Share

Thursday, October 01, 2009

Effective C Tip #6 - Creating a flags variable

This is the sixth in a series of tips on writing effective C. Today I'm going to address the topic of creating what I call a flags variable. A flags variable is nothing more than an integral type that I wish to treat as an array of bits, where each bit represents a flag or boolean value. I find these particularly valuable in three situations:
  1. When the CPU has part of its address space that is much faster to access than other regions. Examples are the zero page on 6805 type processors, and the lower 256 bytes of RAM on AVR processors. Depending upon your compiler, you may also want to do this with the bit addressable RAM region of the 8051.
  2. When I'm running short on RAM and thus assigning an entire byte or integer to store a single boolean flag is waste I can't afford.
  3. When I have a number of related flags where it just makes sense to group them together.
The basic approach is to use bitfields. Now I'm not a huge fan of bitfields - particularly when someone tries to use them to map onto hardware registers. However, for this application they work very well. As usual however, the devil is in the details. To show you what I mean, I'll first show you a typical implementation of mine, and then explain what I'm doing and why.

typedef union
{
    uint8_t     all_flags;      /* Allows us to refer to the flags 'en masse' */
    struct
    {
        uint8_t foo : 1,        /* Explanation of foo */
                bar : 1,        /* Explanation of bar */
                spare5 : 1,     /* Unused */
                spare4 : 1,     /* Unused */
                spare3 : 1,     /* Unused */
                spare2 : 1,     /* Unused */
                spare1 : 1,     /* Unused */
                spare0 : 1;     /* Unused */
    };
} EX_FLAGS;

static EX_FLAGS    Flags;  /* Allocation for the Flags */

Flags.all_flags = 0U; /* Clear all flags */

...

Flags.bar = 1U; /* Set the bar flag */


There are several things to note here.
Use of a union
The first thing to note is that I have used a union of an integral type (uint8_t) and a structure of bitfields. This allows me to access all the flags 'en masse'. This is particularly useful for clearing all the flags as shown in the example code.  Note that our friends at MISRA disallow unions. However, in my opinion, this is a decent example of where they make for better code - except see the caveat below.
Use of integral type
Standard C requires that only types int and unsigned int may be used for the base type of an integer bitfield. However, many embedded systems compilers remove this restriction and allow you to use any integral type as the base type for a bitfield. This is particularly valuable on 8-bit processors. In this case I have taken advantage of the language extension to use a uint8_t type.
Use of an anonymous structure
You will note that the bitfield structure is unnamed, and as such is an anonymous structure. Anonymous structures are part of C++ - but not standard C. However, many C compilers support this construct and so I use it as I feel it makes the underlying code a lot easier to read.
Naming of unused flags
If you look at the way I have named the unused flags, it looks a little odd. That is the first unused flag is spare5, the next spare4 and so on down to spare0. Now I rarely do things on a whim, and indeed this is a good example. So why do I do it this way? Well, there are two reasons:
  1. When I first create the structure, I label all the flags, starting from spare7 down to spare0. This inherently ensures that I name precisely the correct number of flags in the structure. To see why this is useful, take the above code and allocate an extra flag in the bitfield structure. Then compile and see if you get a compilation error or warning. Whether you will or not depends upon whether your compiler allows bitfields to cross the storage unit boundary. If it does, then your compiler will allocate two bytes, and the all_flags member of the union will not cover all of the flags. This can come as a nasty surprise (and perhaps explains why MISRA is wary of unions). You can prevent this from happening by naming the flags as shown.
  2. When it becomes necessary to allocate a new flag, I simply replace the topmost unused flag (in this example that would be spare5) with its new name, e.g. zap. The remainder of the structure is unchanged. If instead I had named the topmost unused flag 'spare0', the next 'spare1' and so on, then the code would give a completely misleading picture of how many spare bits are left for future use after I had taken one of the unused flags.
If you look at what I have done here, it's interesting to note that I have relied upon two extensions to standard C (which violates the MISRA requirement for no use of compiler extensions) and I have also violated a third MISRA tenet via the use of a union. I would not be surprised if I've also violated a few other rules as well. Now I don't do these things lightly, and so I only use this construct when I see real benefit in doing so. I'll leave it for another day to discuss my overall philosophy regarding adherence to the MISRA guidelines. It is of course up to you the reader to make the determination as to whether this is indeed effective C.
Next Tip

Previous Tip
Home

Bookmark and Share

Tuesday, September 29, 2009

The consultant's dilemma

Today I’m going to talk about an interesting ethical dilemma that is faced by all engineers at various times in their careers but which consultants face much more frequently because of the nature of the work. The situation is as follows:

A (potential) client has a new project that they wish to pursue and they have brought you in to discuss its feasibility, risk, development costs etc. At a certain point in the discussion, the topic of CPU architecture comes up. In rare cases, there is only one CPU that makes sense for the job. However in the majority of cases, it’s clear that there are a number of potential candidates that could get the job done and the client is interested in your opinion as to which way to go. In my experience you have the following options:

  1. Recommend your favorite architecture
  2. Recommend that time be spent investigating the optimal architecture
  3. Recommend the architecture that you are most interested in gaining experience on in order to develop your career.

Let’s take a look at these options:

Favorite Architecture
The advantage of going with your favorite architecture is that presumably you are highly experienced with the processor family and that you already have all the requisite tools in order to allow you to quickly and effectively develop the solution. The downsides to this approach are:
  1. It leads to antiquated architectures hanging around for ever. The prime example of this is of course the 8051.
  2. It means that your skill set can stagnate over time.
  3. It also may mean that the client pays more for the hardware than they would if a more optimal solution was used. This comes about when e.g. an ARM processor is used when an HC08 would have done quite nicely.

Architecture Investigation
With this approach you are essentially asking the client to pay you to work out what the optimal solution is to their problem. Sometimes this is just a few days work and other times it’s a lot more. This is often a tough sell because clients expect the consultant to instantly know what the best architecture for their application is. Furthermore, at the end of the day the consultant may end up recommending an architecture for which they have little experience. Whether you think this is reasonable or not depends on how you view consultants. 

Career Development
In the 25+ years I’ve been doing this, I’ve only come across a few blatant cases where it’s clear that an architecture was chosen because that’s what the lead engineer wanted to play with next. My experience is that engineers are way more likely to be too conservative and stick with their favorite architecture than they are to go this route. Nevertheless if you are in the position of asking an engineer (and particularly a consultant) for a CPU architecture recommendation, then you must be aware that this does go on. Your best defense against this is to closely question why a particular architecture is being recommended.

So what do I do when faced with this issue? Well you’ll be pleased to know that I have never recommended an architecture in order to further my career. The decision as to whether to recommend my favorite architecture or to suggest an investigation comes down to one of cost. If the client will be building 500 of the widgets a year, then development costs will dwarf hardware costs and I’ll go with my favorite architecture. Conversely if they will be building 10,000 widgets a year, then an investigation is a must. The middle area is where it gets tricky!

I’d be interested in hearing how you have handled this dilemma.
Home

Bookmark and Share

Thursday, September 24, 2009

Minimizing memory use in embedded systems Tip #3 - Don't use printf()

This is the third in a series of tips on minimizing memory consumption in embedded systems.

If you are like me, the first C program you saw was K&R’s famous ‘hello, world’ code, reproduced below:

main()
{
printf(“hello, world\n”);
}

In my opinion, this program has done incalculable harm to the realm of embedded systems programming! I appreciate that this is a rather extreme statement – but as is usual I have my reasons ...

The interesting thing about this code is that it introduces printf() – and as such gives the impression that printf() is an important (and useful) part of the C language. Well I suppose it is / was for those programming computers. However for those programming embedded systems, printf() and its brethren (sprintf, vsprintf, scanf etc) are in general a disaster waiting to happen for the unwary. Here is why:

Code Size

The printf() functions are immensely sophisticated functions, and as such consume an incredible amount of code space. I have clear memories of an early 8051 compiler’s printf() function consuming 8K of code space (and this was at a time when an 8K program was a decent size). Since then, compiler vendors have put a lot of effort into addressing this issue. For example IAR allows you to specify the functionality (and hence size) of printf() as a library option. Notwithstanding this, if your available code space is less than 32K the chances are you really shouldn’t be using printf(). But what if you need some of the features of printf()? Well in that case I recommend you write your own formatting function. For example I often find that I have a small microcontroller project that needs to talk over a serial link using an ASCII protocol. In cases like these, the easy thing to do is to generate the requisite string using a complex format string with sprintf(). However, with a little bit of ingenuity you should be able to create the string using a series of calls to simple formatting routines. I can guarantee that you’ll end up with more compact code.

Stack Size

Barely a day goes by that someone doesn’t end up on this blog because they have a stack overflow caused by printf(), sprintf() or vsprintf(). Why is this? Well if you are ever feeling bored one day, try and write the printf() function. If you do, you’ll soon find that it is not only difficult, but also that it requires a large amount of space for the function arguments, a lot of temporary buffer space for doing the formatting as well as a large number of intermediate variables. In short, it needs a tremendous amount of stack space. Indeed I have had embedded systems that need a mere 32 bytes of stack space prior to using printf() – and 200+ bytes after I’ve added in printf(). The bottom line is that for small embedded systems, formatted output needs a ridiculous amount of stack space – and that as a result stack overflow is a real possibility.

Variable length arguments

I’m sure most people use sprintf() etc without fully appreciating that these functions use a variable length argument list. I’ll leave for another day the full implications of this. However for now you should just consider that MISRA bans the use of variable length arguments – and that you should take this as a strong hint to avoid these functions in embedded systems.

Execution time

The execution time of printf() can be spectacularly long. For example the ‘hello world’ program given in the introduction requires 1000 cycles on an AVR CPU. Changing it to the almost as trivial function shown below increases the execution to 6371 cycles:

int main( void )
{
int i = 89;

printf("hello, world %d\n", i);
}

Lest you think this is an indictment of the AVR processor, the same code for a generic ARM processor still takes a whopping 1738 cycles. In short, printf() and its brethren can take a really long time to execute.

Now do the above mean you should always eschew formatted output functions? No! Indeed I recommend the use of vsprintf() here for certain classes of problem. What I do recommend is that you think long and hard before using these functions to ensure that you really understand what you are doing (and getting) when you use them.

Previous Tip
Home

Bookmark and Share

Tuesday, September 22, 2009

Lowering power consumption tip #2 - modulate LEDs

This is the second in a series of tips on lowering power consumption in embedded systems.

LEDs are found on a huge percentage of embedded systems. Furthermore their current consumption can often be a very large percentage of the overall power budget for a system. As such reducing the power consumption of LEDs can have a dramatic impact on the overall system power consumption. So how can this be done you ask? Well, it turns out that LEDs are highly amenable to high power strobing. That is, pulsing an LED at say 100 mA with a 10% on time (average current 10 mA) will cause it to appear as bright as an LED that is being statically powered at 20mA. However, like most things, this tradeoff does not come for free, as to take advantage of it, you have to be aware of the following:
  • LEDs are very prone to over heating failures. Thus putting a constant 100 mA through a 20 mA LED will rapidly lead to its failure. Thus any system that that intentionally puts 100 mA through a 20 mA LED needs to be designed such that it can never allow 100 mA to flow for more than a few milliseconds at a time. Be aware that this limit can easily be exceeded when breaking a debugger - so design the circuit accordingly!
  • The eye is very sensitive to flicker, and so the modulation frequency needs to be high enough that it is imperceptible.
  • You can't sink these large currents into a typical microcontroller port pin. Thus an external driver is essential.
  • If the LED current is indeed a large portion of the overall power budget then you have to be aware that the pulsed 100 mA current can put tremendous strain on the power supply
Clearly then, this technique needs to be used with care. However, if you plan to do this from the start, then the hardware details are not typically that onerous and the firmware implementation details are normally straight forward. What I do is drive the LED off a spare PWM output. I typically set the frequency at about 1 kHz, and then set the PWM depth to obtain the desired current flow. Doing it this way imposes no overhead on the firmware and requires just a few setup instructions to get working. Furthermore a software crash is unlikely to freeze the PWM output in the on condition. Incidentally, as well as lowering your overall power consumption, this technique has two other benefits:
  • You get brightness control for free. Indeed by modulating the PWM depth you can achieve all sorts of neat effects. I have actually used this to convey multiple state information on a single LED. My experience is that it's quite easy to differentiate between four states (off, dim, on, bright). Thus next time you need to get more mileage
    out of the ubiquitous debug LED, consider adding brightness control to it.
  • It can allow you to run LEDs off unregulated power. Thus as the supply voltage changes, you can simply adjust the PWM depth to compensate, thus maintaining quasi constant brightness. This actually gives a you further power savings because you are no longer having to accept the efficiency losses of the power supply
Anyway, give it a try on your next project. I think you'll like it.
Next Tip
Previous Tip. Home

Bookmark and Share

Friday, September 18, 2009

FRAM in embedded systems

In a previous post I mentioned that I had recently attended a seminar put on by TI. One of the things that was mentioned briefly in the seminar was that TI will soon be releasing members of its popular MSP430 line containing Ferroelectric RAM or FRAM as it is usually referred to. There's an informative, but poor production quality video on the TI website that describes FRAM's properties. (To view it, just enter the search term 'FRAM' at ti.com. You have to register first, otherwise I'd give you the direct link). Alternatively, Wikipedia has a nice write up as well.

The basic properties of FRAM are quite tantalizing - non-volatile, fast and symmetric read / write times, very low power and essentially immune to light, radiation, magnetic fields etc. Although its speed and density isn't good enough yet to replace other memory types at the high end, the same is not true for MSP430 class microcontrollers.

From what was said at the seminar it seems likely that TI will soon introduce versions of the MSP430 that contain only FRAM and that you the engineer will be able to partition it as you see fit between code and data storage. Furthermore, the data storage is inherently non-volatile, and so the data storage part can presumably be further divided between scratch storage and configuration parameters.

This is all very interesting, but what are the advantages of FRAM over today's typical configuration of Flash + SRAM + EEPROM? Well TI has identified what they consider to be several key areas, namely:
  • Data logging applications. They point out (quite correctly) that with FRAM there is no need to worry about wear leveling algorithms, and that data can be stored (written) 1000 times faster than Flash or EEPROM. While this is all true, I'm actually a bit skeptical that this will be a huge game changer. Why? Well if I can write data 1000 times faster, then I'm going to fill the memory 1000 times faster as well. To put it another way, all the data logging systems I've ever worked on that use low end processors (such as the MSP430) have data logged no more than about a dozen datums no faster than a couple of times a second. In short, high write speeds aren't important. However, I do concede that obviating the need for wear leveling algorithms is very nice.
  • High Security applications. One of the fields that I work in is smartcards. Smartcards are used extensively in the fields of access control, conditional access systems for pay TV, smart purses and so on. The key feature of smart cards is their security. One way to attack a smart card is via differential power analysis. The basic idea is that by measuring the cycle by cycle change in the power consumption of the card, it is possible to determine what it's doing. Given that FRAM essentially consumes the same (and very low) power when it is read and written, it makes it very hard to perform a DPA attack on it. However, for most general purpose applications, this benefit is zero.
  • Low power. For me this is a huge benefit. The ability to write to FRAM at less than 2V will undoubtedly allow me to extend the battery life of some of the systems that I design. Furthermore the amount of energy required to write a byte of FRAM is miniscule compared to Flash or EEPROM. I think TI should be commended for their relentless pursuit of low power in their MSP430 line.
  • Lack of data corruption. Yes folks, believe it or not TI is actually claiming that FRAM eliminates the possibility for data corruption that is associated with other non-volatile memories. Upon hearing this I couldn't make up my mind whether to blame the marketing department or the hardware guys. Regardless, it's clearly not true. While I concede that the fast write times significantly reduces the probability of data corruption occurring, it most certainly does not eliminate it. Until the silicon vendors come up with a mechanism for guaranteeing that an arbitrarily sized block of data can be written atomically regardless of what power is doing, then memory will always be prone to corruption.
So do I see any downsides to FRAM usage in microcontrollers? Not really. However I do expect that it will reveal weaknesses in a lot of code (which is of course a good thing). I expect that this will come about because today when a system powers up, the contents of RAM is quasi random. Code that relies on a location not being a certain value on start up thus has a high probability of working. However, with FRAM, that location will contain whatever you last wrote to it - with all that it implies. As a result, I expect people writing for FRAM systems will get religion in a hurry about data initialization. Anyway, once some parts are out, I hope to be able to have a play with them. If I do I'll undoubtedly write about my experiences. Home

Bookmark and Share

Tuesday, September 15, 2009

A 'C' Test: The 0x10 best questions for would-be embedded programmers (reprised)

In May 2000 Embedded Systems Programming magazine (now Embedded Systems Design) published an article I had written entitled 'A 'C' Test: The 0x10 best questions for would-be embedded programmers'. I received a lot of mail about it at the time (including I might add a decent amount of hate mail) and much to my amazement I continue to get mail about it to this day. The article has been shamelessly copied all over the web while its title is a popular search term that drives people to this blog.

I mention this for two reasons. Firstly, there is a slightly revised version here that is in Word format and so is suitable for customization. Secondly, be aware that this test has been widely publicized so be very suspicious of someone that does really well on it! To illustrate my point, when I wrote the article I was doing some work at a large company and was sharing an office with a fellow consultant, Nelson. Naturally I had Nelson proof read the article. Fast forward a few months and Nelson goes off to an interview with a potentially new client. Well it so happens that the interview occurs on the same day that my article is published and so the interviewer proceeds to use it verbatim on Nelson. Nelson of course aces the test leaving the interviewer astounded. Needless to say, we both found this to be very amusing! Alas, I've never had anyone in the intervening 9+ years hit me with it. Maybe next week...
Home

Bookmark and Share

Sunday, September 13, 2009

Reader feedback

If I'm to believe the numbers for this blog, I'm getting both a large number of page views per day as well as a significant number of readers coming back on a regular basis to see what I have to say. While the page view statistics are nice, I actually value the returning reader far more than I do the one-time visitor who drops in looking for a solution to a particular problem. Thus I find myself in a bit of a quandary. While the page view statistics give me a very good idea about what is driving first time visitors to this site, I really don't have a clue as to why anyone actually bothers to come back, or indeed what they are hoping to see on their next visit. Thus if you are a regular reader I'd be obliged if you could give me some feedback on what you (dis)like about this blog, and perhaps more importantly - what you'd like me to address in future postings. Feel free to use the comment section or to email me if you'd prefer your thoughts to be private.

Thanks!
Home

Bookmark and Share

Thursday, September 10, 2009

Observations on the relevance of C++ to embedded systems

My fellow blogger Mike Barr recently wrote an article entitled 'Real men program in C'. Given that his blogs are cross posted at embedded.com, it was soon picked up by reddit et al and the usual language wars started - with all that these wars usually entail. Personally I don't get very worked up on this subject and so I didn't participate. However it did dove tail rather nicely with a conversation I had recently with Dan Saks. I had asked Dan for his thoughts on the difficulty (impossibility!) of inlining global functions in C. The conversation was interesting in its own right, but at the end Dan posed the question 'Why don't you program it in C++?' (since for the uninitiated, C++ allows you to quite nicely inline a class's public functions). I'll leave for another day, my response and also my thoughts on C++. However, it did get me thinking a lot about this issue.

Now although I have many thoughts on this topic, the one that I'd like to share with you today is my observation that there is an incredible dearth of example C++ code for embedded systems. What do I mean by this? Well like most of you, I regularly download example code from vendors sites - and it's nearly always written in C and not C++. I'd previously explained this away by assuming that it was because I do a lot of work in the 8/16 bit realm, and that smaller processors are more likely to be programmed in C than C++. However, yesterday I attended a seminar put on by TI. There were several things of interest in the seminar, including TI's proprietary RF networking protocol SimpliciTI and also their recently acquired Cortex 3 line from Luminary. The FAE encouraged us to look at the code that was available for both of these entities - and so I did.

What I found is that the SimpliciTI code is all written in C as was all the Luminary code I looked at including their impressive graphics library. Hmmmm thought I - is this an aberration or is this norm? For my next stop I went over to the Micrium web site where they offer a fine array of products including an RTOS, a variety of protocol stacks, a graphics library and so on. All the ones I looked at were written in C. Same story over at Segger. OK, thought I, what about the compiler vendors? A sampling of the code examples at the IAR and Keil websites (for their respective ARM product lines) showed them to be all in C. Finally I headed over to the Greenhills website to check out their enormous Networking and Communications product line. I chose half a dozen products at random. In all cases where the language was specified, it was ANSI C.

Is this a true random sample - of course not. However it does suggest to me that the industry hasn't exactly embraced C++. Now it's debatable whether the tool vendors and silicon suppliers should lead the industry or whether they should reflect reality. Regardless of your perspective on this, it's clear to me that I'll know C++ has been embraced by the embedded community only when the majority of the publicly available code is written in C++. Personally, if it hasn't happened by now, I don't think it's going to.

Home

Bookmark and Share

Friday, September 04, 2009

Minimizing memory use in embedded systems tip#2 - Be completely consistent in your coding style

This is the second in a series of postings on how to minimize the memory consumption of an embedded system.

As the title suggests, you'll often get a nice reduction in code size if you are completely consistent in your HLL coding style. To show how this works, its necessary to take a trip into assembly language.

When you write in assembly language you soon find that you perform the same series of instructions over and over again. For example, to add two numbers together, you might have pseudo assembly language code that looks something like this:

LD X, operand1 ; X points to operand 1
LD Y, operand2 ; Y points to operand 2
LD R0,X ; Get operand 1
LD R1,Y ; Get operand 2
ADD ;
ST R0 ; Store the result in R0

After you have done this a few times, it becomes clear that the only thing that changes from use to use is the address of the operands. As a result, assembly language programmers would typically define a macro. The exact syntax varies from assembler to assembler, but it might look something like this:

MACRO ADD_BYTES(P1, P2)
LD X, P1 ; X points to parameter 1
LD Y, P2 ; Y points to parameter 2
LD R0,X ; Get operand 1
LD R1,Y ; Get operand 2
ADD ;
ST R0 ; Store the result in R0
ENDM

Thereafter, whenever it is necessary to add two bytes together, one would simply enter the macro together with the name of the operands of interest. However, after you have invoked the macro a few dozen times, it probably dawns on you that you are chewing up memory un-necessarily and that you can save a lot by changing the macro to this:

MACRO ADD_BYTES(P1, P2)
LD X, P1 ; X points to parameter 1
LD Y, P2 ; Y points to parameter 2
CALL LDR0R1XY
ENDM

It is of course necessary to now define a subroutine 'LDR0R1XY' that looks like this:

LDR0R1XY:
LD R0,X ; Get operand 1
LD R1,Y ; Get operand 2
ADD ;
ST R0 ; Store the result in R0
RET

Clearly this approach starts to save a few bytes per invocation, such that once one has used ADD_BYTES several times one achieves a net saving in memory usage. If one uses ADD_BYTES dozens of times then the savings can be substantial.

So how does this help if you are programming in a HLL? Well, decent compilers will do exactly the same optimization when told to perform full size optimization. However, in this case, the optimizer looks at all the code sequences generated by the compiler and identifies those code sequences that can be placed in a subroutine. A really good compiler will do this recursively in the sense that it will replace a code sequence with a subroutine call, and that subroutine call will in turn call another subroutine and so on. The results can be a dramatic reduction in code size - albeit at a potentially big increase in demand on the call stack.

Now clearly in order to take maximal advantage of this compiler optimization, it's essential that the compiler see the same code sequences over and over again. You can maximize the likelihood of this occurring by being completely consistent in your coding style. Some examples:
  • When making function calls, keep the parameter orders consistent. For example if you call a lot of functions with two parameters such as a uint8_t and a uint16_t, then ensure that all your functions declare the parameters in the same order.
  • If most of your variables are 16 bit, with just a handful being 8 bit, then you may find you get a code size reduction if you convert all your variables to 16 bits.
  • Don't flip randomly between case statements and if-else-if chains.
Note that notwithstanding the fact that being completely consistent can save you a lot of code space, I also think that code that is extremely consistent in its style has other merits as well, not the least of which is readability. As a final note, does anyone know the formal name for this type of optimization? Next Tip Previous Tip Home

Bookmark and Share

Wednesday, August 26, 2009

Minimizing memory use in embedded systems Tip #1 - Eliminate unnecessary strings

I already have a series of tips on efficient C, another on effective C and a third on lowering the power consumption of embedded systems. Today I'm introducing a fourth series of tips related to minimizing memory usage in embedded systems.

Now back when I was a lad the single biggest issue in an embedded system was nearly always a lack of memory, and as a result one had to quickly learn how to husband this resource with great care. Fast forward 20 years and this notion probably seems quite quaint to those of you programming ARM system with 16 Mbytes of Flash and 64 Mbytes of RAM.

So what's the motivation for this post then? Well, despite the presence of gigantic memory systems in many embedded systems, it's still surprisingly common for one to find oneself in a situation where memory is being gobbled up at an alarming rate. Anyone that has programmed an 8051 or an 8 bit PIC recently will know exactly what I'm talking about. So for those of you out there that find yourself in this situation, I hope that you'll find this series informative.

Enough preamble - on to business. The first tip is quite simple - eliminate unnecessary strings. Even if your reaction is 'well that's useless - I don't have any strings in my code', then I still suggest you read on.

In order to eliminate unnecessary strings, the first step is to determine the list of strings in your code. You can of course pore over your source code. However a far better approach is to scan the binary image looking for strings. Somewhat amazingly I actually use a utility called 'strings.exe' that is supplied by Microsoft. It's available here.

I like this program because you can search for ASCII and/or Unicode strings, while also controlling the minimum number of matching characters. (Please note that this utility is intended to scan a pure binary file. Intel Hex, S records etc don't cut it). If you do this, then you may of course find no strings - and I apologize for wasting your time. However, even if your program is supposed to be string free, you may well find things such as:
  • Copyright notices
  • Strings associated with assert statements.
  • Other compiler artifacts such as path names.
The latter two tend to arise if any code references the __FILE__ macro or its brethren. Of course working out how to eliminate these strings can be challenging - and in the case of copyright notices may violate the terms of a license agreement - so don't get too aggressive. If your code does contain intentional strings, then you have several opportunities to reduce their footprint. The obvious method of making the strings more terse is of course an excellent thing to do. Less obvious is that you may find that you have multiple strings that are very similar - particularly if multiple people are working on a project. For example, I've recently seen code that contained a dozen variations on the string "Malloc failed". For example:
  • Malloc failed
  • Malloc Failed
  • Malloc error
  • Etc
Now, the robust way to handle this is of course to ban inline strings and instead place them all in a string file, so that someone needing to use a string can simply reuse one that already exists. If this strikes you as too much work, then you may be interested to know that there are some linkers out there that will recognize duplicate strings and collapse them down to a single entry. However, to get this benefit, the strings need to be absolutely identical. Searching the binary image as I have described is a great way of identifying strings which will benefit from this manual optimization. Next Tip Home

Bookmark and Share

Thursday, August 20, 2009

Consulting as a leading economic indicator

The IEEE has a rather depressing news release out that claims that EE unemployment more than doubled last quarter to a record high 8.6%. The previous quarterly record was a mere 7% in Q1 2003. Interestingly the unemployment rate for all engineers was a mere 5.5% which suggests that EE's are taking the brunt of engineering unemployment. If you are one of those unfortunate enough to be axed, then what's the employment outlook for you?

Well I'm no economist and I certainly don't have access to, or interest in, reams of economic data. What I can do is give you my micro-economic perspective. Over the 15 years I've been a consultant I've developed the notion that consultant activity is a leading economic indicator. That is, when companies need engineering help, but are unsure whether to take on employees, then they turn to consultants. Conversely when companies need to cut costs, the first to go are consultants and contractors. In short, consultants are the first to go in bad times and the first to be retained in good times. This hypothesis seems reasonable to me, and broadly reflects my experiences. So with this as a background, what can I tell you about the current economic state of affairs?

Well, firstly the current 'slowdown' came on so hard and so fast that my sense is that consultants and employees basically bit the dust simultaneously. OK, so what about the upside? Am I seeing an increase in demand for my services? In short - no. Having said that I almost never see an increase in demand for my services in July and August for the simple reason that too many people are on holiday. Notwithstanding this, my sense is that it is still very quiet.

So am I pessimistic? Actually - no. A large slice of the stimulus money has been funneled to organizations such as the NSF, which are only now getting around to doling out various grants. Thus I expect this to start having an effect on EE demand soon. I also have the sense that a lot of companies having weathered the financial storm are now looking ahead to see how they can best exploit the upturn when it comes. If I'm right, then the phone should start ringing again in September. I'll post an update around the end of September and let you know if I'm right!

Home

Bookmark and Share

Monday, August 17, 2009

Effective C Tip #5 - Use pre-masking rather than post-masking

This is the fifth in a series of tips on writing what I call effective C.

Today I'd like to offer a simple hint that can potentially make your buffer manipulation code a little more robust at essentially zero cost. I'd actually demonstrated the technique in this posting, but had not really emphasized its value.

Consider, for example, a receive buffer on a communications channel. The data are received a character at a time under interrupt and so the receive ISR needs to know where to place the next character. The question arises as to how best to do this? Now for performance reasons I usually make my buffer size a power of 2 such that I can use a simple mask operation. I then use an offset into the buffer to dictate where the next byte should be written. Code to do this typically looks something like this:
#define RX_BUF_SIZE (32)
#define RX_BUF_MASK  (RX_BUF_SIZE - 1)
static uint8_t Rx_Buf[UART_RX_BUF_SIZE];/* Receive buffer */

static uint8_t RxHead = 0; /* Offset into Rx_Buf[] where next character should be written */

__interrupt void RX_interrupt(void)
{
uint8_t rx_char;

rx_char = HW_REG;  /* Get the received character */


Rx_Buf[RxHead] = rx_char; /* Store the received char
++RxHead;   /* Increment offset */
RxHead &= RX_BUF_MASK; /* Mask the offset into the buffer */
}
In the last couple of lines, I increment the value of RxHead and then mask it, with the intention of ensuring that the next write into Rx_Buf[] will be in the requisite range. The operative word here is 'intention'. To see what I mean, consider what would happen if RxHead gets corrupted in some way. Now if the corruption is caused by RFI or some other such phenomenon then you are probably out of luck. However, what if RxHead gets unintentionally manipulated by a bug elsewhere in your code? As written, the manipulation may cause a write to occur beyond the end of the buffer - with all the attendant chaos that would inevitably arise. You can prevent this by simply doing the masking before indexing into the array. That is the code looks like this:
__interrupt void RX_interrupt(void)
{
uint8_t rx_char;

rx_char = HW_REG;  /* Get the received character */

RxHead &= RX_BUF_MASK; /* Mask the offset into the buffer */
Rx_Buf[RxHead] = rx_char; /* Store the received char
++RxHead;   /* Increment offset */
}
What has this bought you? Well by coding it this way you guarantee that you will not index beyond the end of the array regardless of the value of RxHead when the ISR is invoked. Furthermore the guarantee comes at zero performance cost. Of course this hasn't solved your problem with some other piece of code stomping on RxHead. However it does make finding the problem a lot easier because your problem will now be highly localized (i.e. data are received out of order) versus the system crashes randomly. The former class of problem is considerably easier to locate than is the latter.

So is this effective 'C'. I think so. It's a simple technique that adds a little robustness for free. I wouldn't mind finding a few more like it.

Next Tip
Previous Tip
Home

Bookmark and Share

Wednesday, August 05, 2009

A tutorial on signed and unsigned integers

One of the interesting things about writing a blog is looking at the search terms that drive traffic to your blog. In my case, after I posted these thoughts on signed versus unsigned integers, I was amazed to see how many people were ending up here looking for basic information concerning signed and unsigned integers. In an effort to make these folks visits more successful, I thought I'd put together some basic information on this topic. I've done it in a question and answer format.

All of these questions have been posed to a search engine which has driven traffic to this blog. For regular readers of this blog looking for something a bit more advanced, you will find the last section more satisfactory.

Are integers signed or unsigned?


A standard C integer data type ('int') is signed. However, I strongly recommend that you do not use the standard 'int' data type and instead use the C99 data types. See here for an explanation. Incidentally, although an int is signed, there is a difference between an 'int' and a 'signed int'. Thus given the following code:
int  a;
signed int  b;
unsigned int c;

Then a and b are *not* the same data type. Naturally neither is the same as c.

How do I convert a signed integer to an unsigned integer?


This is in some ways a very elementary question and in other ways a very profound question. Let's consider the elementary issue first. To convert a signed integer to an unsigned integer, or to convert an unsigned integer to a signed integer you need only use a cast. For example:
int  a = 6;
unsigned int b;
int  c;

b = (unsigned int)a;

c = (int)b;

Actually in many cases you can dispense with the cast. However many compilers will complain, and Lint will most certainly complain. I recommend you always explicitly cast when converting between signed and unsigned types.

OK, well what about the profound part of the question? Well if you have a variable of type int, and it contains a negative value such as -9 then how do you convert this to an unsigned data type and what exactly happens if you perform a cast as shown above? Well the basic answer is - nothing. No bits are changed, the compiler just treats the bit representation as unsigned. For example, let us assume that the compiler represents signed integers using 2's complement notation (this is the norm - but is *not* mandated by the C language). If our signed integer is a 16 bit value, and has the value -9, then its binary representation will be 1111111111110111. If you now cast this to an unsigned integer, then the unsigned integer will have the value 0xFFF7 or 6552710. Note however that you cannot rely upon the fact that casting -9 to an unsigned type will result in the value 0xFFF7. Whether it does or not depends entirely on how the compiler chooses to represent negative numbers.

What's more efficient - a signed integer or an unsigned integer?


The short answer - unsigned integers are more efficient. See here for a more detailed explanation.

When should I use an unsigned integer?


In my opinion, you should always use unsigned integers, except in the following cases:
  • When the entity you are representing with your variable is inherently a signed value.
  • When dealing with standard C library functions that required an int to be passed to them.
  • In certain weird cases such as I documented here.
Now be advised that many people strongly disagree with me on this topic. Naturally I don't find their arguments persuasive.

Why should I use an unsigned integer?

Here are my top reasons:
  • By using an unsigned integer, you are conveying important information to a reader of your code concerning the expected range of values that a variable may take on.
  • They are more efficient.
  • Modulus arithmetic is completely defined.
  • Overflowing an unsigned data type is defined, whereas overflowing a signed integer type could result in World War 3 starting.
  • You can safely perform shift operations.
  • You get a larger dynamic range.
  • Register values should nearly always be treated as unsigned entities - and embedded systems spend a lot of time dealing with register values.

What happens when I mix signed and unsigned integers?

This is the real crux of the problem with having signed and unsigned data types. The C standard has an entire section on this topic that only a compiler writer could love - and that the rest of us read and wince at. Having said that, it is important to know that integers that are signed get promoted to unsigned integers. If you think about it, this is the correct thing to happen. However, it can lead to some very interesting and unexpected results. A number of years ago I wrote an article "A ‘C’ Test:The 0x10 Best Questions for Would-be Embedded Programmers" that was published in Embedded Systems Programming magazine. You can get an updated and corrected copy at my web site. My favorite question from this test is question 12 which is reproduced below - together with its answer: What does the following code output and why?
void foo(void)
{
unsigned int a = 6;
int b = -20;
(a+b > 6) ? puts("> 6") : puts("<= 6");
}
This question tests whether you understand the integer promotion rules in C - an area that I find is very poorly understood by many developers. Anyway, the answer is that this outputs "> 6". The reason for this is that expressions involving signed and unsigned types have all operands promoted to unsigned types. Thus -20 becomes a very large positive integer and the expression evaluates to greater than 6. This is a very important point in embedded systems where unsigned data types should be used frequently (see reference 2). If you get this one wrong, then you are perilously close to not being hired. This is all well and good, but what should one do about this? Well you can pore over the C standard, run tests on your compiler to make sure it really does conform to the standard, and then write conforming code, or you can do the following: Never mix signed and unsigned integers in an expression. I do this by the use of intermediate variables. To show how to do this, consider a function that takes an int 'a' and an unsigned int 'b'. Its job is to return true if b > a, otherwise it returns false. As you shall see, this is a surprisingly difficult problem... To solve this problem, we need to consider the following:
  • The signed integer a can be negative.
  • The unsigned integer b can be numerically larger than the largest possible value representable by a signed integer
  • The integer promotion rules can really screw things up if you are not careful.
With these points in mind, here's my stab at a robust solution
bool foo(int a, unsigned int b)
{
bool res;

if (a < 0)
{
res = true; /* If a is negative, it must be less than b */
}
else
{
unsigned int c;
c = (unsigned int) a; /* Since a is positive, this cast is safe */
if (b > c)  /* Now I'm comparing the same data types */
{
res = true;
}
else
{
res = false;
}
}
return res;
}

Is this a lot of work - yes. Could I come up with a more compact implementation that is guaranteed to work for all possible values of a and b - probably. Would it be as clear - I doubt it. Perhaps regular readers of this blog would like to take a stab at producing a better implementation?

Bookmark and Share

Friday, July 31, 2009

Efficient C Tips #10 - Use unsigned integers

This is the tenth in a series of tips on writing efficient C for embedded systems. Today I consider the topic of whether one should use signed integers or unsigned integers in order to produce faster code. Well the short answer is that unsigned integers nearly always produce faster code. Why is this you ask? Well there are several reasons:

Lack of signed integer support at the op code level


Many low end microprocessors lack instruction set support (i.e. op codes) for signed integers. The 8051 is a major example. I believe low end PICs are also another example. The Rabbit processor is sort of an example in that my recollection is that it lacks support for signed 8 bit types, but does have support for signed 16 bit types! Furthermore some processors will have instructions for performing signed comparisons, but only directly support unsigned multiplication.

Anyway, so what's the implication of this? Well lacking direct instruction set support, use of a signed integer forces the compiler to use a library function or macro to perform the requisite operation. Clearly this is not very efficient. But what if you are programming a processor that does have instruction set support for signed integers? Well for most basic operations such as comparison and addition you should find no difference. However this is not the case for division...

Shift right is not the same as divide by two for signed integers


I doubt there is a compiler in existence that doesn't recognize that division by 2N is equivalent to a right shift N places for unsigned integers. However this is simply not the case for signed integers, since the issue of what to do with the sign bit always arises. Thus when faced with performing a division by 2N on a signed integer, the compiler has no choice other than to invoke a signed divide routine rather than a simple shift operation. This holds true for every microprocessor I have ever looked at in detail.

There is a third area where unsigned integers offer a speed improvement over signed integers - but it comes about by a different mechanism...

Unsigned integers can often save you a comparison


From time to time I find myself writing a function that takes as an argument an index into an array or a file. Naturally to protect against indexing beyond the bounds of the array or file, I add protection code. If I declare the function as taking a signed integer type, then the code looks like this:
void foo(int offset)
{
if ((offset >= 0) && (offset < ARRAY_SIZE))
{
//Life is good
...
}
}
However, if I declare the function as taking an unsigned integer type, then the code looks like this:
void foo(unsigned int offset)
{
if (offset < ARRAY_SIZE)
{
//Life is good
...
}
}
Clearly it's nonsensical to check whether an unsigned integer is >=0 and so I can dispense with a check. The above are examples of where unsigned integer types are significantly more efficient than signed integer types. In most other cases, there isn't usually any difference between the types. That's not to say that you should choose one over the other on a whim. See this for a discussion of some of the other good reasons to use an unsigned integer. Before I leave this topic, it's worth asking whether there are situations in which a signed integer is more efficient than an unsigned integer? Off hand I can't think of any. There are situations where I could see the possibility of this occurring. For example when performing pointer arithmetic, the C standard requires that subtraction of two pointers return the data type ptrdiff_t. This is a signed integral type (since the result may be negative). Thus if after subtracting two pointers, you needed to add an offset to the result, it's likely that you'll get better code if the offset is a signed integral type. Of course this touches upon the nasty topic of mixing signed and unsigned integral types in an expression. I'll address this another day. Next Tip Previous Tip Home

Bookmark and Share

Friday, July 17, 2009

Lowering power consumption tip #1 - Avoid zeros on the I2C bus

I already have a series of tips on efficient C and another on effective C. Today I'm introducing a third series of tips - this time centered on lowering the power consumption of embedded systems. As well as the environmental benefits of reducing the power consumption of an embedded system, there are also a plethora of other advantages including reduced stress on regulators, extended battery life (for portable systems) and also of course reduced EMI.

Notwithstanding these benefits, reducing power consumption is a topic that simply doesn't get enough coverage. Indeed when I first started working on portable systems twenty years ago there was almost nothing on this topic beyond 'use the microprocessors power saving modes'. Unfortunately I can't say it has improved much beyond that!

So in an effort to remedy the situation I'll be sharing with you some of the things I've learned over the last twenty years concerning reducing power consumption. Hopefully you'll find it useful.

Anyway, enough preamble. Today's posting concerns the ubiquitous I2C bus. The I2C bus is found in a very large number of embedded systems for the simple reason that it's very good for solving certain types of problems. However, it's not exactly a low power consumption interface. The reason is that its open-drain architecture requires a fairly stiff pull up resistor on the clock (SCL) and data (SDA) lines. Typical values for these pull up resistors are 1K - 5K. As a result, every time SCL or SDA goes low, you'll be pulling several milliamps. Conversely when SCL or SDA is high you consume essentially nothing. Now you can't do much about the clock line (it has to go up and down in order to well, clock the data) - but you can potentially do something about the data line. To illustrate my point(s) I'll use as an example the ubiquitous 24LC series of I2C EEPROMS such as the 24LC16, 24LC32, 24LC64 and so on. For the purposes of this exercise I'll use the 24LC64 from Microchip.

The first thing to note is that these EEPROMs have the most significant four I2C address bits (1010b) encoded in silicon - but the other three bits are set by strapping pins on the IC high or low. Now I must have seen dozens of designs that use these serial EEPROMs - and in every case the address lines were strapped low. Thus all of these devices were addressed at 1010000b. Simply strapping the 3 address lines high would change the devices address to 1010111b - thus minimizing the number of zeros needed every time the device is addressed.

The second thing to note is that the memory address space for these devices is 16 bits. That is after sending the I2C address, it is necessary to send 16 bits of information that specify the memory address to be accessed. Now in the case of the 24LC64, the three most significant address bits are 'don't care'. Again in every example I've ever looked at, people do the 'natural' thing, and set these bits to zero. Set them to 1 and you'll get an immediate power saving on every address that you send.

As easy as this is, there's still more that can be done in this area. In most applications I have ever looked at, the serial EEPROM is not completely used. Furthermore, the engineer again does the 'natural' thing, and allocates memory starting at the lowest address and works upwards. If instead you allocate memory from the top down, and particularly if you locate the most frequently accessed variables at the top of the memory, then you will immediately increase the average preponderance of '1s' in the address field, thus minimizing power. (Incidentally if you find accessing the correct location in EEPROM hard enough already, then I suggest you read this article I wrote a few years ago. It has a very nifty technique for accessing serial EEPROMs courtesy of the offsetof() macro).

Finally we come to the data itself that gets stored in the EEPROM. If you examine the data that are stored in the EEPROM and analyze the distribution of the number of zero bits in each byte, then I think you'll find that in many (most?) cases the results are heavily skewed towards the typical data byte having more zero bits than one bits. If this is the case for your data, then it points to a further power optimization - namely invert all bytes before writing them to EEPROM, and then invert them again when you read them back. With a little care you can build this into the low level driver such that the results are completely transparent to the higher levels of the application.

If you put all these tips together, then the power savings can be substantial. To drive home the point, consider writing zero to address 0 with the 24LC64 located at I2C address 1010000b. Using the 'normal' methodology, you would send the following bytes:
1010000 //I2C Address byte = 1010000 with R/W = 0
0000000 //Memory address MSB = 0x00
0000000 //Memory address LSB = 0x00
0000000 //Datum = 0x00

Using the ammended methodology suggested herein, the 24LC64 would be adressed at 1010111b, the 3 most significant don't care bits of the address would be set to 111b, the datum would be located at some higher order address, such as xxx11011 11001100b, and the datum would be inverted. Thus the bytes written would be:
10101110 //I2C Address byte = 1010111 with R/W = 0
11111011 //Memory address MSB = 0xFC
11001100 //Memory address LSB = 0xCC
11111111 //Datum = 0xFF

Thus using this slightly extreme example, the percentage of zeros in the bit stream has been reduced from 30/32 to 8/32 - a dramatic reduction in power.

Obviously with other I2C devices such as an ADC you will not always have quite this much flexibility. Conversely if you are talking to another microprocessor you'll have even more flexibility in how you encode the data. The point is, with a little bit of thought you can almost certainly reduce the power consumption of your I2C interface.

As a final note. I mentioned that you can't do much about the clock line. Well that's not strictly correct. What you can do is run the clock at a different frequency. I'll leave it for another posting to consider the pros and cons of changing the clock frequency.

Home

Bookmark and Share

Saturday, July 11, 2009

Debugging with cell phones

If you walk in the door of a doctor's office here in the USA, the chances are there will be a sign admonishing you to turn off your phone. Most people probably assume this has something to do with common courtesy - and I'm sure that's part of it. However the larger issue is the fact that cell phone transmissions can play havoc with an EKG.

What's this got to do with embedded systems? Well yesterday I was trying to debug a piece of code - only to be faced with a debug environment that would just randomly crash, taking down the debugger with it. Naturally my first thought was that I had made a stupid coding error. However, after some serious head scratching I noticed that I had placed my Blackberry down next to the ribbon cable leading from the emulator to the target. If a cell phone can mess up an EKG being performed 10 m away, I'm sure it can really do a number on a high speed debugger interface when it's a mere 10 cm away. In short, not a smart idea. Removal of the cell phone solved the problem.

What's the lesson here? Well the obvious one is that cell phones have no business in a laboratory. However, upon reflection there is a larger issue. I take great effort to make my code as hygienic as possible. However, my workbench is usually a disaster area with extraneous stuff all over the place. Maybe it's time I literally cleaned my act up in this department. If I had I'd have noticed the phone a lot sooner.

Bookmark and Share

Saturday, July 04, 2009

Effective C Tip #4 - Prototyping static functions

This is the fourth in a series of tips on writing effective C.

I have previously talked about the benefits of static functions. Today I'm addressing where to place static functions in a module. This posting is motivated by the fact that I've recently spent a considerable amount of time wading through code that locates its static functions at the top of the file. That is the code looks like this:
static void fna(void)
{
...
}

static void fnb(uint16_t a)
{
...
}

...

static uint16_t fnc(void)
{
...
}

void fn_public(void)
{
uint16_t t;

fna();
t = fnc();
fnb(t);
...
}
In this approach (which unfortunately seems to be the more common), all of the static functions are defined at the top of the module, and the public functions appear at the bottom. I've always strongly disliked this approach because it forces someone that is browsing the code to wade through all the minutiae of the implementation before they get to the big picture public functions. This can be very tedious in a file with a large number of static functions. The problem is compounded by the fact that it's very difficult to search for a non static function. Yes I'm sure I could put together a regular expression search to do it - but it requires what I consider to be unnecessary work.

A far better approach is as follows. Prototype (declare) all the static functions at the top of the module. Then follow the prototypes with the public functions (thus making them very easy to locate) and then place the static functions out of the way at the end of the file. If I do this, my code example now looks like this:

static void fna(void);
static void fnb(uint16_t a);
static uint16_t fnc(void);

void fn_public(void)
{
uint16_t t;

fna();
t = fnc();
fnb(t);
...
}

static void fna(void)
{
...
}

static void fnb(uint16_t a)
{
...
}

...

static uint16_t fnc(void)
{
...
}

If you subscribe to the belief that we only write source code so that someone else can read it then this simple change to your coding style can have immense benefits to the person that has to maintain your code (including a future version of yourself).

Update: There's a very interesting discussion in the comments section - I recommend taking a look.
Next Tip
Previous Tip
Home

Bookmark and Share

Saturday, June 20, 2009

Thoughts on BCC's, LRC's, CRC's and being experienced

Those of us that have been working in this field for a long time are referred to as 'experienced'. Experienced is taken to mean that we have been doing this for long enough that we have experienced many of the problems common to embedded systems and thus know how to solve them. Although this is true for many things, I think there is a downside to it - namely that because we've successfully solved a particular problem a number of times that we fall into the trap of thinking that our solution is optimal. In order to guard against this it is essential to be proactive in seeking out new solutions to old problems. To illustrate my point, I'll take you on an abbreviated trip through the memory lane of my career when it comes to that most prosaic of problems - transmitting serial data between microcontrollers.

Back when I was a lad I was by definition naive and so I just transmitted the data without any thought to how to detect errors beyond the use of a parity bit on each byte. Well it didn't take me long to work out that a simple parity bit wasn't exactly a robust way of detecting errors, and so I started appending a simple additive checksum to the message.

Well that worked for a while until the day it dawned on me that an additive checksum without an initial seed value was vulnerable to a stuck channel (e.g. all zeros). From that day on I started seeding my checksum computations with initial values. I tended to favour 0x2B (with apologies to Hamlet).

Somewhere along the road I switched from perfoming an additive checksum to using an XOR operation. I can't remember why I did this - but it just seemed 'better'.

This approach served me well for many years until I started investigating cyclic redundancy checks (CRC). I'd known about CRC's for a long time of course. However all the ones I knew about used 16 or 32 bit values and had certain wondrous but rather unspecified properties for detecting certain classes of errors. To put it bluntly they seemed like complete overkill for sending a short message between two microprocessors - and so I didn't entertain them. However this all changed the day I came across an 8 bit CRC. This changed my perspective dramatically. An 8 bit CRC designed for protecting small messages - excellent! Thus henceforth I eschewed the use of an LRC and instead opted for an 8 bit CRC to protect my messages.

Well this continued for a number of years. I learned more about CRCs, I got older until one day I decided to ask myself the question - is the 8 bit CRC I am using optimal? For regular readers of this blog, you'll probably have noticed that 'optimal solutions' is a recurring theme. Anyway, with this thought in mind, I set off on a hunt to determine whether in fact the 8 bit CRC I was using to protect small messages was indeed optimal. That's when I came across this paper by Koopman and Chakravarty. It's entitled 'Cyclic Redundancy Code (CRC) Polynomial Selection for Embedded Networks'. It's a highly readable and informative paper. They essentially investigate what constitutes 'optimal' for a CRC polynomial and then exhaustively explore optimal polynomials for different data lengths and different polynomial lengths. Most interestingly they slay some sacred cows along the way, including the popular CRC-8 polynomial (x8+x7+x6+x4+x1+1).

Having read the paper, I discovered that the CRC I was using (the so called ATM-8 polynomial(x8+x2+x1+1)) wasn't bad for my application - but it wasn't optimal. Upon reflection this was hardly surprising since I had essentially selected it on the basis that it was designed for a similar application to mine - and thus must be decent. However as Koopman shows - this can be a very foolhardy assumption. I just got lucky.

More importantly from my perspective is that using Koopman's paper I now have a logical methodology for determining the optimal CRC for any application. Thus after close to 30 years of doing this I think I'm finally homing in on the truly optimal solution to this problem.

Of course, the larger lesson to be learned here is that just because you have done something a certain way for many years means nothing unless you know that it is the optimal way of doing it. That's when you are truly 'experienced'.
Home

Bookmark and Share

Wednesday, June 17, 2009

Do I have the technical skills to be a consultant?

My previous post on being a consultant addressed the issue of how to market yourself. Today I'll look at something a little more prosaic - how can you tell if you have the necessary technical skills to be a consultant? This post was motivated by an email I received from Victor Johns who basically asked the aforementioned question.

Before I answer this question, I should note that while technical skills are essential to being a successful consultant, they are by no means sufficient. I'll leave it to another day to discuss the sales and business skills required to run a consulting business.

Anyway - on to the answer. Well my first and rather sardonic observation is that you don't need to be technically competent at all. Just about every engineer I have ever met has unfortunately experienced the case of the clueless consultant - that is someone that does more harm than good. While these individuals do of course exist, they are by no means 'successful' as they have to spend an inordinate amount of time winning new business as no one ever hires them a second time.

If we ignore the aforementioned clueless consultant, then I think my answer depends a bit on what sort of consultant do you want to be? Some consultants are specialists and others are generalists. If you are a specialist, then essentially you are marketing yourself as the 'go to guy' in a narrow field. A good example might be Bluetooth. If you are promoting yourself as a Bluetooth expert then you had better know pretty much all there is to know about Bluetooth. However, what about the majority of consultants who are more generalists? In their case absolute knowledge is not as important as the ability to learn fast and to apply skills learned in one field to the field they are currently in. The reason I say this is because no sensible client will expect you to know 'everything' needed to do a particular job. Rather they expect that you have the fundamental skills upon which you can rapidly build in order to solve the problem. It's for this reason that my ideal project is one with 30% 'new stuff'. That is I know exactly how to do 70% of the project, whereas the remaining 30% will require me to learn new tools / skills.

This of course brings up the issue of how does one stay up to date? While there are many ways of doing this, I find textbooks to offer the best bang for the buck. Simply put, a $100 text book that saves me an hour on a project is a good investment. One that saves me a day is an outstanding investment. It's for this reason that I have a stellar technical library.

As a parting comment I'll note that we have all run into the occasional engineer who 'knows' they know it all - while actually being pedestrian. In my experience it's the engineers that have a lot of confidence in their ability - but still realize that they can't hope to 'know it all' that ultimately will succeed in this business. I'm talking about you Victor!

Bookmark and Share

Wednesday, June 10, 2009

Three byte integers

One of the enduring myths about the C language is that it is good for use on embedded systems. I've always been puzzled by this. While it is true that many other languages are dreadful for use on embedded systems, this merely means that C is less dreadful rather than 'good'. While I have a host of issues with C, the one that constantly galls me is the lack of 3 byte integers. Using C99 notation these would be the uint24_t and int24_t data types. Now a quick web search indicates that there may be the odd compiler out there that supports 3 byte integers - but the vast majority do not.

So why exactly do I want a 3 byte integer? Well, there are two main reasons:

Intermediate results


When I look through my code, I find a huge number of incidences where I am performing an arithmetic operation on a 16 bit value, where intermediate values overflow 16 bits, yet the final value is 16 bits. For example:
uint16_t a, b;

a = (b * 51) / 64;

In this case, the code will fail if (b * 51) overflows 16 bits. As a result, I am forced to write:
a = ((uint32_t)b * 51) / 64;

However, examination of this code shows that (b * 51) could never overflow 24 bits for all 16 bit b. Thus I'd much rather write:
a = ((uint24_t)b * 51) / 64;

Now obviously on a 32 bit processor there would be zero benefit to doing this (indeed there may be a penalty). However on an 8 bit (and probably a 16 bit) processor, there would be a dramatic benefit to such a construct.

Real world values


I regularly find myself needing a static variable that requires more than 16 bits of range. However when I look at these variables they almost never require the staggering range of a 32 bit variable. Instead 24 bits would do very nicely. Needless to say I am forced to allocate 32 bits even though I know that the most significant byte will never take on anything other than zero. This is particularly galling when these variables are stored in EEPROM - with its associated cost and long write times.

Taking these two together across all the 8/16 bit embedded systems out there, the cost in wasted instruction cycles, memory, stack size and energy must be truly staggering. We could probably save a power plant or two world wide with all the energy being wasted!

So why don't most compiler vendors support a 24 bit integer? I don't know for sure, but I suspect it is some combination of:
  • No one has been asking for it.
  • They are more concerned with being C89 / C99 compliant than they are with being useful.
  • No one has ever implemented a compiler benchmark where support for a 3 byte integer would be useful.
If you happen to agree with me that a 3 byte integer would be very useful, then next time you see your friendly compiler vendor - complain (or at least point them to this blog). Who knows, change may yet come!

Bookmark and Share

Friday, June 05, 2009

Division of integers by constants

An issue that comes up frequently in embedded systems is division of an integer by a constant. Of course most of the time we try and arrange things such that the divisor is a power of two such that the division may be performed by shift operations. However, all too often we have to divide an integer by some non power of two value. Divisors that seem to crop up a lot are 10 & 100 (for obvious reasons), 3 (for no good reason), 60 (when dealing with time) and of course various combination's of pi and root 2. In cases like these you can of course just code it 'normally' and let the compiler do the work for you. However, when you feel the need for speed, there are other techniques that are spectacularly good.

I learned about this subject in dribs and drabs over the years without ever coming across a good summary - until I located this paper by Douglas Jones (no relationship). It does a nice job of explaining most of what you need to know in order to perform division of an integer by a constant. I particularly like the fact that he has algorithms for CPUs that contain barrel shifters - and those that do not. I strongly recommend that you read the paper. One note of caution however - Jones like many academics is used to working on CPUs with 32 bit word lengths. As such, his code assumes that integers are 32 bits. If you use his code as is, then it will fail on 16 bit word length machines. It's for reasons such as this that I really recommend everyone would use the C99 data types.

For those of you too lazy to read the paper, its basic premise is based upon the fact that division by a constant is equivalent to multiplication by the reciprocal of that constant. There is nothing of course earth shattering about this observation. However, Jones then goes ahead and explains about binary points, rounding etc in order to achieve the desired result. Since I had to reduce his paper to practice, I thought I'd go ahead and share the 'recipe' with you. Before doing so I should note that I work mostly with 8 & 16 bit CPUs that do not contain barrel shifters. As a result I am most interested in the techniques that use multiplication. If you are working with a 32 bit processor with a barrel shifter and an instruction cache then you should seriously look at his other implementations.

Division of a uint16_t by a constant K


In the steps that follow, there is no requirement that K be integer. It must however be greater than 1.
There are two recipes. The first works for many divisors - but not all and is the faster of the two. The second recipe will give better results for all inputs - but produces less efficient code. While I am sure that there is some analytical way of making the determination ahead of time, I've found it easier to use the first recipe and exhaustively test it. If it works - great. If not then switch to the second recipe.

In the following descriptions, Q is the quotient (i.e. the result) of dividing an unsigned integer A by the constant K.

Recipe #1


  1. Convert 1 / K into binary. There is a nice web based calculator here that will do the job.
  2. Take all the bits to the right of the binary point, and left shift them until the bit to the right of the binary point is 1. Record the required number of shifts S.
  3. Take the most significant 17 bits and add 1 and then truncate to 16 bits. This effectively rounds the result.
  4. Express the remaining 16 bits to the right of the binary point as a 4 digit hexadecimal number M of the form hhhh.
  5. Q = (((uint32_t)A * (uint32_t)M) >> 16) >> S
  6. Perform an exhaustive check for all A & Q. If necessary adjust M or try recipe #2.

Incidentally, you may be wondering why I don't use the form espoused by Jones, namely:Q = (((uint32_t)A * (uint32_t)M) >> (16 + S))
The answer is that this requires a left shift 16 + S places of a 32 bit integer. By splitting the shift into two as shown and by making use of the C integer promotion rules, the expression becomes:

  1. Right shift a 32 bit integer 16 places and convert to a 16 bit integer. This effectively means just use the top half of the 32 bit integer.
  2. Right shift the 16 bit integer S places.

This is dramatically more efficient on an 8 or 16 bit processor. On a 32 bit processor it probably is not.

Recipe #2



  1. Convert 1 / K into binary.
  2. Take all the bits to the right of the binary point, and left shift them until the bit to the right of the binary point is 1. Record the required number of shifts S.
  3. Take the most significant 18 bits and add 1 and then truncate to 17 bits. This effectively rounds the result.
  4. Express the 17 bit result as 1hhhh. Denote the hhhh portion as M
  5. Q = ((((uint32_t)A * (uint32_t)M) >> 16) + A) >> 1) >> S;
  6. Perform an exhaustive check for all A & Q. If necessary adjust M.

Again I split the shifts up as shown for efficiency on an 8 / 16 bit machine.

Example 1 - Divide by 30


In this case I wish to divide a uint16_t by 30.

  1. Convert to binary. 1 / 30 = 0.000010001000100010001000100010001000100010001000100010001
  2. Left shift until there is a 1 to the right of the binary point. In this case it requires 4 shifts and we get 0.10001000100010001000100010001000100010001000100010001. S is thus 4.
  3. Take the most significant 17 bits: 1000 1000 1000 1000 1
  4. Add 1: giving 1000 1000 1000 1000 1 + 1 = 1000 1000 1000 1001 0
  5. Truncate to 16 bits: 1000 1000 1000 1001
  6. Express in hexadecimal: M = 0x8889
  7. Q = (((uint32_t)A * (uint32_t)0x8889) >> 16) >> 4

An exhaustive check confirms that this expression does indeed do the job for all 16 bit values of A. It is also about 10 times faster than the compiler division routine on an AVR processor.

Example 2 - Divide by 100


In this case I wish to divide a uint16_t by 100. This is one of those cases where we need 17 bit resolution

  1. Convert to binary. 1 / 100 = 0.00000010100011110101110000101000111101011100001010001111011
  2. Left shift until there is a 1 to the right of the binary point. In this case it requires 6 shifts and we get 0.10100011110101110000101000111101011100001010001111011. S is thus 6.
  3. Take the most significant 18 bits: 1 0100 0111 1010 1110 0
  4. Add 1: 1 0100 0111 1010 1110 0 + 1 = 1 0100 0111 1010 1110 1
  5. Truncate to 17 bits: 1 0100 0111 1010 1110
  6. Express in hexadecimal: M = 1 47AE
  7. Q = ((((uint32_t)A * (uint32_t)0x47AE) >> 16) + A) >> 1) >> 6;

An exhaustive check shows that the division is not exact for all A. I thus incremented M to 0x47AF and got exact results for all A. This code was about twice as fast as the compiler division routine on an AVR processor.

Example 3 - Divide by π


This is an example where the resultant expression results in an approximate result. The approximation is very good though, with a quotient that is off by at most 1 for all A.

  1. Convert to binary: 1 / π = 0.010100010111110011000001101101110010011100100010001001
  2. Left shift until there is a 1 to the right of the binary point. In this case it requires 1 shift and we get
    10100010111110011000001101101110010011100100010001001. S is thus 1.
  3. Take the most significant 18 bits: 1 0100 0101 1111 0011 0
  4. Add 1: 1 0100 0101 1111 0011 0 + 1 = 1 0100 0101 1111 0011 1
  5. Truncate to 17 bits: 1 0100 0101 1111 0011
  6. Express in hexadecimal: M = 1 45F3
  7. Q = ((((uint32_t)A * (uint32_t)0x45F3) >> 16) + A) >> 1) >> 1;

An exhaustive check that compared the result of this expression to (float)A * 0.31830988618379067153776752674503f showed that the match was exact for all but 263 values in the range 0 - 0xFFFF. Where there was a mismatch it is off by at most 1. It's also 23 times faster than converting to floating point. Not a bad trade off.

Example 4 - Divide by 10 on an 8 bit value


This technique is obviously usable on 8 bit values. One just has to adjust the number of bits. Here's an example

  1. Convert to binary. 1 / 10 = 0.0001100110011001100110011001100110011001100110011001101
  2. Left shift until there is a 1 to the right of the binary point. In this case it requires 3 shifts and we get 0.1100110011001100110011001100110011001100110011001101. S is thus 3.
  3. Take the most significant 9 bits: 1100 1100 1
  4. Add 1: giving 110011001 + 1 = 110011010
  5. Truncate to 8 bits: 1100 1101
  6. Express in hexadecimal: M = 0xCD
  7. Q = (((uint16_t)A * (uint16_t)0xCD) >> 8) >> 3

An exhaustive check confirms that this expression does indeed do the job for all 8 bit values of A. It is also about 8 times faster than the compiler division routine on an AVR processor.

Summary


Using the values generated by Jones, together with some of the values I have computed, here's a summary of some common divisors for unsigned 16 bit integers.

Divide by 3: (((uint32_t)A * (uint32_t)0xAAAB) >> 16) >> 1
Divide by 5: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 2
Divide by 6: (((uint32_t)A * (uint32_t)0xAAAB) >> 16) >> 2
Divide by 7: ((((uint32_t)A * (uint32_t)0x2493) >> 16) + A) >> 1) >> 2
Divide by 9: (((uint32_t)A * (uint32_t)0xE38F) >> 16) >> 3
Divide by 10: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 3
Divide by 11: (((uint32_t)A * (uint32_t)0xBA2F) >> 16) >> 3
Divide by 12: (((uint32_t)A * (uint32_t)0xAAAB) >> 16) >> 3
Divide by 13: (((uint32_t)A * (uint32_t)0x9D8A) >> 16) >> 3
Divide by 14: ((((uint32_t)A * (uint32_t)0x2493) >> 16) + A) >> 1) >> 3
Divide by 15: (((uint32_t)A * (uint32_t)0x8889) >> 16) >> 3
Divide by 30: (((uint32_t)A * (uint32_t)0x8889) >> 16) >> 4
Divide by 60: (((uint32_t)A * (uint32_t)0x8889) >> 16) >> 5
Divide by 100: (((((uint32_t)A * (uint32_t)0x47AF) >> 16U) + A) >> 1) >> 6
Divide by PI: ((((uint32_t)A * (uint32_t)0x45F3) >> 16) + A) >> 1) >> 1
Divide by √2: (((uint32_t)A * (uint32_t)0xB505) >> 16) >> 0

Hopefully you have spotted the relationship between divisors that are multiples of two. For example compare the expressions for divide by 15, 30 & 60.

If someone has too much time on their hands and would care to write a program to compute the values for all integer divisors, then I'd be happy to post the results for everyone to use.

Update


Alan Bowens has risen to the challenge and has generated some nifty programs for generating coefficients for arbitrary 8 and 16 bit values. He's also generated header files for all 8 and 16 bit integer divisors that you can just include and use. You'll find it all at his blog. Nice work Alan.

Bookmark and Share

Thursday, May 28, 2009

Efficient C Tips #9 - Use lookup tables

This the ninth in a series of tips on how to make your C code more efficient.

(Note if you are looking for basic information on lookup tables, you should read this).

Typically the fastest ways to compute something on a microcontroller is to not compute it all - but to simply read the result from a lookup table. For example this is regularly done as part of CRC calculations. Despite this I've noticed over the years what I'll call the 'look up tables are boring' syndrome. What do I mean by this? Well when having to code a solution to a problem, it seems that most of us would rather code something that involves crunching numbers, rather than generate a table where we just look up the result. I'm sure that many of you are thinking that I'm dead wrong and that you use lookup tables all the time. Well I'm sure many of you do. However the question is whether you make full use of this capability?

What started me thinking about this is the person who ended up on this blog looking for an efficient algorithm for determining the day of the year. I have no idea if they were coding for an embedded system or not, nor whether they were looking for a fast solution, a minimal memory solution, or something in between. However, it did make me realize that it would be a simple albeit slightly contrived way of demonstrating my point about look up tables.

First off, I imposed some constraints
  • The Gregorian calendar is to be used.
  • Days, months and years are numbered from 1 and not zero, such that January 1 is day 1 and not day 0.
Here's my first solution, that makes use of a small lookup table:
#define JAN_DAYS (31)
#define FEB_DAYS (28)
#define LY_FEB_DAYS (29)
...
#define NOV_DAYS (30)

#define MONTHS_IN_A_YEAR (12+1)

uint16_t day_of_year(uint8_t day, uint8_t month, uint16_t year)
{
static const uint16_t days[2][MONTHS_IN_A_YEAR] = 
{
{
/* Non leap year table */
0,    /* Padding because first month is not zero */
0,    /* If month is january, then no days before it */
JAN_DAYS,  
JAN_DAYS + FEB_DAYS,
...
JAN_DAYS + FEB_DAYS + ... + NOV_DAYS
},
{
/* Leap year lookup table */
0,    /* Padding because first month is not zero */
0,    /* If month is january, then no days before it */
JAN_DAYS,  
JAN_DAYS + LY_FEB_DAYS,  
...
JAN_DAYS + LY_FEB_DAYS + ... + NOV_DAYS
}
};

uint16_t day_of_year;

if ((year % 4 == 0) && (year % 100 != 0) || (year % 400 == 0))
{
/* Leap year */
day_of_year = days[1][month] + day;
}
else
{
/* Non leap year */
day_of_year = days[0][month] + day;
}
return day_of_year;
}
For most applications I think this is an optimal solution in that it handles a very wide range of dates, uses a small amount of storage for the lookup tables and requires minimal computational effort to achieve the result. (On an ARM7 it requires 128 bytes of code space, 64 bytes for the lookup table and executes in about 40 cycles). However, what about if the code had to run as fast as possible? I'd guess that most folks would work on optimizing the details of the implementation and leave it at that. I'm not sure that many people would consider a gigantic look up table so that the code looks like this:
#define LAST_YEAR (2400 + 1)/* Last year to worry about */
uint16_t day_of_year(uint8_t day, uint8_t month, uint16_t year)
{
static const uint16_t days[LAST_YEAR][[MONTHS_IN_A_YEAR] = 
{
{ /* Padding because first year is not zero */
0, /* Padding because first month is zero */
0, /* If month is january, then no days before it */
JAN_DAYS,  
JAN_DAYS + FEB_DAYS,
...
JAN_DAYS + FEB_DAYS + ... + NOV_DAYS
},

{ /* Year 1 - non leap year */
0, /* Padding because first month is zero */
0,  /* If month is january, then no days before it */
JAN_DAYS,  
JAN_DAYS + FEB_DAYS,
...
JAN_DAYS + FEB_DAYS + ... + NOV_DAYS
},

... 

{ /* Last year - a leap year */
0, /* Padding because first month is zero */
0,  /* If month is january, then no days before it */
JAN_DAYS,  
JAN_DAYS + LY_FEB_DAYS,
...
JAN_DAYS + LY_FEB_DAYS + ... + NOV_DAYS
},
};

return days[year][month] + day;
}
The lookup table would of course require at least 2401 * 13 * 2 = 62426 bytes. Evidently this would likely be unreasonable on an 8 bit processor. On a 32 bit processor with 8 Mbytes of Flash - not so unreasonable. I first learned this lesson many years ago in an application that required an 8051 processor to perform a complicated refresh of multiplexed LEDs at about 1 kHz (a significant load for the 8051). The initial implementation consisted of pure code. Over the next year or so the two of us working on it realized that we could speed it up by using lookup tables. We started off with a small look up table, and by the time we were done, the table was 48K (out of the 64K available to the 8051) while the execution time was a fraction of what it had been before. Thus next time you are faced with making something run faster consider using a look up table - even if it is huge. Sometimes it's just the best way to go. Next Tip Previous Tip Home

Bookmark and Share

Friday, May 15, 2009

Checking the fuse bits in an Atmel AVR at run time

In general I try and post on topics that have broad appeal in the embedded world. Today I'm going to partially break with that tradition to show how to check the fuse bits in an Atmel AVR class processor. However, before I do so, I'd like to discuss my motivations for wanting to do this.

The AVR processor family, together with the PIC and other processor families contain fuse / configuration bits. These bits are settable only at program time and are used to configure the behavior of the processor at run time. Typical parameters that are configured are oscillator types, brown out voltage detect levels and memory partitioning. Now as I lamented in this post, there is no great way of communicating to the production staff how you want these fuse bits programmed. As a result I consider there to be a very high probability that a mistake will be made in production - and that all my efforts on crafting perfect code will thus be for naught. Thus while it is much better to prevent mistakes, if you can't do so, then the next best thing to do is to detect them. As a result on one of the products that I am working on, I have as one of the startup tests a check to ensure that the fuse bits are indeed what they are supposed to be. While I recognize that if the fuse settings are dreadfully wrong it is unlikely that my code will run, I'm actually more concerned with the case where the fuse bits are set mostly correct - and thus that the code works most of the time.

So how do I do this on an AVR? Well if you are using an IAR compiler the work is mostly done for you. Here it is:
#include <intrinsics.h>

/* Macros to read the various fuse bytes */
#define _SPM_GET_LOW_FUSEBITS()  __AddrToZByteToSPMCR_LPM((void __flash*)0x0000U, 0x09U)
#define _SPM_GET_HIGH_FUSEBITS()  __AddrToZByteToSPMCR_LPM((void __flash*)0x0003U, 0x09U)
#define _SPM_GET_EXTENDED_FUSEBITS()  __AddrToZByteToSPMCR_LPM((void __flash*)0x0002U, 0x09U)

/* Structure to store the fuse bytes */
typedef struct
{
uint8_t  fuse_low; /* The low fuse setting */
uint8_t  fuse_high; /* The high fuse setting */
uint8_t  fuse_extended; /* The extended fuse setting */
uint8_t  lockbits; /* The lockbits */
} FUSE_SETTINGS;

/* Storage for the fuse settings will be in EEPROM */
static __eeprom __no_init FUSE_SETTINGS Fuse_Settings @ FUSE_VALUES; 

void fuses_Read(void)
{
FUSE_SETTINGS value;

value.fuse_low = _SPM_GET_LOW_FUSEBITS();
value.fuse_high = _SPM_GET_HIGH_FUSEBITS();
value.fuse_extended = _SPM_GET_EXTENDED_FUSEBITS();
value.lockbits = _SPM_GET_LOCKBITS();
__no_operation();

Fuse_Settings = value; 
}

The macro __AddrToZByteToSPMCR_LPM() is defined in intrinsics.h. Essentially it takes care of all the necessary finicky register usage required to read the fuse bits. You'll also notice that I have used a macro _SPM_GET_LOCKBITS() to read the lockbits. This macro is also found in intrinsics.h. The really observant reader may wonder why there isn't a macro in intrinsics.h for reading the fuse bits? Well there is - it's just for reading the low fuse byte - which is all the early AVR processors had. I've pointed this out to IAR and they have promised to address this in the next release (thanks Steve!).

Before I leave this topic, I'll also point out that I don't read the fuse settings directly into EEPROM. Instead I read them into RAM and then copy the entire structure to EEPROM. I do this because writing to EEPROM messes with the same registers used for reading the fuse bits - and thus bad things happen. This also explains the __no_operation() statement before the data are copied to EEPROM.

Incidentally, I don't know of a way to read the configuration bits of a PIC at run time. Chalk this up as one more reason why an AVR is superior to a PIC!

Home

Bookmark and Share

Saturday, May 09, 2009

Signed versus unsigned integers

If you are looking for some basic information on signed versus unsigend integers, you may also find this post useful. That being said, on to the original post...

Jack Ganssle's latest newsletter arrived the other day. Within it is an extensive set of comments from John Carter, in which he talks about and quotes from a book by Derek Jones (no relation of mine). The topic is unsigned versus signed integers. I have to say I found it fascinating in the same way that watching a train wreck is fascinating. Here's the entire extract - I apologize for its length - but you really have to read it all to understand my horror.

"Suppose you have a "Real World (TM)" always and forever positive value. Should you represent it as unsigned?

"Well, that's actually a bit of a step that we tend to gloss over...

"As Jones points out in section 6.2.5 the real differences as far as C is concerned between unsigned and signed are...

" * unsigned has a larger range.

" * unsigned does modulo arithmetic on overflow (which is hardly ever what you intend)

" * mixing signed and unsigned operands in an expression involves arithmetic conversions you probably don't quite understand.

"For example I have a bit of code that generates code ... and uses __LINE__ to tweak things so compiler error messages refer to the file and line of the source code, not the generated code.

"Thus I must do integer arithmetic with __LINE__ include subtraction of offsets and multiplication.

"* I do not care if my intermediate values go negative.

"* It's hard to debug (and frightening) if they suddenly go huge.

"* the constraint is the final values must be positive.

"Either I must be _very_ careful to code and test for underflows _before_ each operation to ensure intermediate results do not underflow. Or I can say tough, convert to 32bit signed int's and it all just works. I.e. Line numbers are constrained to be positive, but that has nothing to do representation. Use the most convenient representation.

"C's "unsigned" representation is useless as a "constrain this value to be positive" tool. E.g. A device that can only go faster or slower, never backwards:

unsigned int speed; // Must be positive.
unsigned int brake(void)
{
--speed;
}

"Was using "unsigned" above any help to creating robust error free code? NO! "speed" may now _always_ be positive... but not necessarily meaningful!

"The main decider in using "unsigned" is storage. Am I going to double my storage requirements by using int16_t's or pack them all in an array of uint8_t's?

"My recommendation is this...

" * For scalars use a large enough signed value. eg. int_fast32_t
" * Treat "unsigned" purely as a storage optimization.
" * Use typedef's (and splint (or C++)) for type safety and accessor functions to ensure constraints like strictly positive. E.g.

typedef int_fast32_t velocity; // Can be negative
typedef int_fast32_t speed; // Must be positive.
typedef uint8_t dopplerSpeedImage_t[MAX_X][MAX_Y]; // Storage optimization


I read this, and quite frankly my jaw dropped. Now the statements made by Carter / Jones concerning differences between signed and unsigned are correct - but to call them the real differences is completely wrong. To make my point, I'll first of all address his specific points - and then I'll show you where the real differences are:

Unsigned has a larger range


Yes it does. However, if this is the reason you are using an unsigned type you've probably got other problems.

Unsigned does modulo arithmetic on overflow (which is hardly ever what you intend)


Yes it does, and au contraire - this is frequently what I want (see for example this). However, far more importantly is the question - what does a signed integer do on overflow? The answer is that it is undefined. That is if you overflow a signed integer, the generated code is at liberty to do anything - including deleting your program or starting world war 3. I found this out the hard way many years ago. I had some PC code written for Microsoft's Version 7 compiler. The code was inadvertently relying upon signed integer overflow to work a certain way. I then moved the code to Watcom's compiler (Version 10 I think) and the code failed. I was really ticked at Watcom until I realized what I had done and that Watcom was perfectly within their rights to do what they did.

Note that this was not a case of porting code to a different target. This was the same target - just a different compiler.

Now let's address his comment about modulo arithmetic. Consider the following code fragment:
uint16_t a,b,c, res;

a = 0xFFFF; //Max value for a uint16_t
b = 1;
c = 2;

res = a;
res += b; //Overflow
res -= c; 

Does res end up with the expected value of 0xFFFE? Yes it does - courtesy of the modulo arithmetic. Furthermore it will do so on every conforming compiler.

Now if we repeat the exercise using signed data types.
int16_t a,b,c, res;

a = 32767; //Max value for a int16_t
b = 1;
c = 2;

res = a;
res += b; //Overflow - WW3 starts
res -= c;

What happens now? Who knows? On your system you may or may not get the answer you expect.

Mixing signed and unsigned operands in an expression involves arithmetic conversions you probably don't quite understand


Well whether I understand them or not is really between me and Lint. However, the key thing to know is that if you use signed integers by default, then it is really hard to avoid combining signed and unsigned operands. How is this you ask? Well consider the following partial list of standard 'functions' that return an unsigned integral type:
  • sizeof()
  • offsetof()
  • strcspn()
  • strlen()
  • strpsn()
In addition memcpy(), memset(), strncpy() and others also use unsigned integral types in their parameter lists. Furthermore in embedded systems, most compiler vendors typedef IO registers as unsigned integral types. Thus any expression involving a register also includes unsigned quantities. Thus if you use any of these in your code, then you run a very real risk of running into signed / unsigned arithmetic conversions. Thus IMHO the usual arithmetic conversions issue is actually an argument for avoiding signed types - not the other way around! So what are the real reasons to use unsigned data types? I think these reasons are high on my list:
  • Modulus operator
  • Shifting
  • Masking

Modulus Operator

One of the relatively unknown but nasty corners of the C language concerns the modulus operator. In a nutshell, using the modulus operator on signed integers when one or both of the operands is negative produces an implementation defined result. Here's a great example in which they purport to show how to use the modulus operator to determine if a number is odd or even. The code is reproduced below:
int main(void)
{
int i;

printf("Enter a number: ");
scanf("%d", &i);

if( ( i % 2 ) == 0) 
printf("Even");
if( ( i % 2 ) ==1) 
printf("Odd");

return 0;
}
When I run it on one of my compilers, and enter -1 as the argument, nothing gets printed, because on my system -1 % 2 = -1. The bottom line - using the modulus operator with signed integral types is a disaster waiting to happen.

Shifting

Performing a shift right on a signed integer is implementation dependent. What this means is that when you shift right you have no idea whether the sign bit is preserved or if it is propagated. The implications of this are quite profound. For example, if foo is an unsigned integral type, then a shift right is equivalent to a divide by 2. However, if foo is a signed type, then a shift right is most certainly not the same as a divide by 2 - and will generate different code. It's for this reason that Lint, MISRA and most good coding standards will reject any attempt to right shift a signed integral type. BTW while left shifts on signed types are safer, I really don't recommend them either.

Masking

A similar class of problems occur if you attempt to perform masking operations on a signed data type.

Finally...

Before I leave this post, I just have to comment on this quote from Carter
"Either I must be _very_ careful to code and test for underflows _before_ each operation to ensure intermediate results do not underflow. Or I can say tough, convert to 32bit signed int's and it all just works".
Does anyone else find this scary? He seems to be advocating that rather than think about the problem at hand, he'd rather switch to a large signed data type - and trust that everything works out OK. He obviously thinks he's on safe ground. However consider the case where he has a 50,000 line file (actually 46342 to be exact). Is this an unreasonably large file - well yes for a human generated file. However for a machine generated file (e.g. an embedded image file), it is not unreasonable at all. Furthermore let's assume that his computations involve for some reason a squaring of the number of lines in the file: i.e. we get something like this:
int32_t lines, result;

lines = 46342;
result = lines * lines + some_other_expression;
Well 46342 * 46342 overflows a signed 32 bit type - and the result is undefined. The bottom line - using a larger signed data type to avoid thinking about the problem is not recommended. At least if you use an unsigned type you are guaranteed a consistent answer. Home

Bookmark and Share

Saturday, May 02, 2009

Doxygen

Today's post was inspired by a new version notice from Dimitri van Heesch concerning his great documentation generator tool doxygen. If you aren't aware of doxygen, then I strongly recommend reading about it and then using it.

So what is Doxygen exactly? Well it has a lot of capabilities, but in a nutshell it can parse your code (C, C++, Java and a host of others not usually used in the embedded space) and from it generate a very nice hyper-linked documentation set. It does this in part by looking for what I'll call control directives embedded in comments. Now what I particularly like about Doxygen is that it allows you to trade off between adding control directives while still making your comments readable. For example, at one extreme you can do nothing special to your code and still end up with a reasonable documentation set. On the other extreme, you can embed so many control directives into your comments that the only sane way to read the comments is via Doxygen; however the documentation will be truly impressive! In my case, I find control directives to be very distracting, and so I opt to use a minimal set that doesn't offend my sensibilities but still gives me very useful results.

So why do I do this? Well while this documentation set is very nice in its own right, I actually find it very useful in improving my code. As remarkable a claim as this is, it's easily substantiated. Here are a few examples:
Call Trees

One of the very nice add-ons to Doxygen is graphviz. Using graphviz, Doxygen will generate call trees for all of your functions. I often find this very illuminating - both at a macro level and also a micro level. At the macro level, if I see a call tree that looks like your average two years old's art work, then it's a clear indication of muddled thinking - and impending doom. At the micro level it allows you to spot some errors. For example consider this code fragment, that is intended to update a parameter in an EEPROM data structure, together with its backup copy:
void params_NosChargesSet(uint16_t nos_charges)
{
Factory_Params1.n_charges = nos_charges;
update_factory1_crc();
Factory_Params2.n_charges = nos_charges;
update_factory1_crc(); 
}

I found the bug in this code not by testing it, but by simply browsing the Doxygen documentation and noticing that the call tree for this function was incorrect. What I liked about this is that this kind of bug is very difficult to detect through testing, and will not be noticed by static analysis. It was however clear as day by looking at its call tree.
Missing documentation

Sometimes when I'm anxious to solve 'the real problem', I find that I'm not as diligent as I should be about describing the use of manifest constants, variables etc. As a result I'll sometimes end up with code that looks like this:
#define SHORT_TERM_BUF_SIZE (8U) /**< meaningful comment */
#define LONG_TERM_BUF_SIZE (32U)
You'll notice that LONG_TERM_BUF_SIZE has no comment associated with it. However, it's "obvious" what its use is because of the comment associated with SHORT_TERM_BUF_SIZE that immediately precedes it. Well when you generate the Doxygen documentation, and you click on the hyperlink associated with LONG_TERM_BUF_SIZE, guess what - no description. While some may think that this is a weakness in Doxygen, I actually think it's a major strength. Here's why:
  • My coding standard requires me to provide a comment for all manifest constants. Thus it is reminding me of the error of my ways.
  • Someone new coming to the code will typically be overwhelmed by what they are faced with. Having an 'implicit comment' is just one more hurdle for them to overcome. Thus Doxygen is accurately reflecting what someone will see when they read your code.
Is Doxygen perfect? No it's not. It often hangs when I run it. However to be fair, that's usually because I haven't played by the rules. Despite this I find it a useful tool in my arsenal. I recommend you take a look at it. Home

Bookmark and Share

Saturday, April 25, 2009

PIC stack overflow

For regular readers of this blog I apologize for turning once again to the topic of my Nom de Guerre. If you really don't want to read about stack overflow again, then just skip to the second section of this posting where I address the far more interesting topic of why anyone uses an 8-bit PIC in the first place.

Anyway, the motivation for this post is that the most common search term that drives folks to this blog is 'PIC stack overflow'. While I've expounded on the topic of stacks in general here and here, I've never explicitly addressed the problem with 8 bit PICs. So to make my PIC visitors happy, I thought I'll give them all they need to know to solve the problem of stack overflow on their 8 bit PIC processors.

The key thing to understand about the 8 bit PIC architecture is that the stack size is fixed. It varies from a depth of 2 for the really low end devices to 31 for the high end 8 bit devices. The most popular parts (such as the 16F877) have a stack size of 8. Every (r)call consumes a level, as does the interrupt handler. To add insult to injury, if you use the In Circuit Debugger (ICD) rather than a full blown ICE, then support for the ICD also consumes a level. So if you are using a 16 series part (for example) with an ICD and interrupts, then you have at most 6 levels available to you. What does this mean? Well if you are programming in assembly language (which when you get down to it was always the intention of the PIC designers) it means that you can nest function calls no more than six deep. If you are programming in C then depending on your compiler you may not even be able to nest functions this deep, particularly if you are using size optimization.

So on the assumption that you are overflowing the call stack, what can you do? Here's a checklist:
  • Switch from the ICD to an ICE. It's only a few thousand dollars difference...
  • If you don't really need interrupt support, then eliminate it.
  • If you need interrupt support then don't make any function calls from within the ISR (as this subtracts from your available levels).
  • Inline low level functions
  • Use speed optimization (which effectively inlines functions)
  • Examine your call tree and determine where the greatest call depth occurs. At this point either restructure the code to reduce the call depth, or disable interrupts during the deepest point.
  • Structure your code such that calls can be replaced with jumps. You do this by only making calls at the very end of the function, so that the compiler can simply jump to the new function. (Yes this is a really ugly technique).
  • Buy a much better compiler.
If you are still stuck after trying all these, then you really are in a pickle. You could seek paid expert help (e.g. from me or some of the other folks that blog here at embeddedgurus) or you could change CPU architectures. Which leads me to:

So why are you using a PIC anyway?

The popularity of 8 bit PICs baffles me. It's architecture is awful - the limited call stack is just the first dreadful thing. Throw in the need for paging and banking together with the single interrupt vector and you have a nightmare of a programming model. It would be one thing if this was the norm for 8 bit devices - but it isn't. The AVR architecture blows the PIC away, while the HC05 / HC08 are also streets ahead of the PIC. Given the choice I think I'd even take an 8051 over the PIC. I don't see any cost advantages, packaging advantages (Atmel has just released a SOT23-6 AVR which is essentially instruction set compatible with their largest devices) or peripheral set advantages. In short, I don't get it! Incidentally, this isn't an indictment of Microchip - they are a great company and I really like a lot of their other products, their web site, tech support and so on (perhaps this is why the PIC is so widely used?). So to the (ir)regular readers of this blog - if you are you using 8 bit PICs perhaps you could use the comment section to explain why. Let the debate begin! Home

Bookmark and Share

Sunday, April 19, 2009

Unused interrupt vectors

With the exception of low end PIC microcontrollers, most microcontrollers have anywhere from quite a few to an enormous number of interrupt vectors. It's a rare application that uses every single interrupt vector, and so the question arises as to what, if anything, should one do with unused interrupt vectors? I have seen two approaches used - neither of which is particularly good.

Do nothing


I would say this is the most common approach. My guess is that when this approach is used, it's not via conscious choice, but rather the result of inaction. So what's the implication of this approach? Well if an interrupt occurs for which you have not installed an interrupt handler, then the microcontroller will vector to the appropriate address and start executing whatever code happens to be there. It's fair to say that this will ultimately cause a system crash - the only question is how much damage will be done in the process? Having said that, I don't necessarily consider that this approach is always awful. For example a reasonable argument might go something like this.
I know via design, code inspection, static analysis and testing that the probability of a coding error enabling the wrong interrupt is remote. Thus if it does happen it's probably either via severe RF interference, or because the code has crashed. In either case the system has bigger problems than vectoring to an unsupported interrupt.

Of course anybody that's put this much thought into it, will probably be conscientious enough to do something different.

Another valid argument on very memory constrained processors is that you need the unused interrupt vector space for the application. Indeed I have coded 8051 applications where this has been the case. Such is the price we sometimes have to pay on very small systems.

Install 'RETI' instructions at all unused vectors


In this approach, you arrange for there to be a 'Return From Interrupt' instruction at every unused interrupt vector. Indeed this approach is common enough that some compiler manufacturers offer it as a linker option. The concept with this approach is that if an unexpected interrupt occurs, then by executing a RETI instruction, the application will simply continue with very little harm done. All in all this isn't a bad approach. However it has several weaknesses.
  • The biggest problem with this approach is that it doesn't solve the problem of an interrupt source that keeps on interrupting. The most egregious example of this is a level triggered interrupt on a port pin. In this case, depending upon the CPU architecture, it is quite possible for the system to go into a mode whereby it essentially spends all its time vectoring to the interrupt and then returning. However this is by no means the only example. Others that spring to mind are 'Transmit buffer empty' interrupts, and timer overflow type interrupts. In the latter case, the system probably wouldn't spend all of its time interrupting; however a certain fraction of the CPU bandwidth would be wasted, which in a battery powered application for instance, would be a big deal.
  • If you do this at the start of a project, you lose the opportunity to discover errors in which an interrupt source has been erroneously enabled. In short this approach can mask problems, while what is really needed is an approach that can reveal problems.

Recommended Approach

What I do is the following.
  1. At the start of a project I create a file called vector.c In vector.c I create an interrupt handler for every possible interrupt vector. Not only is this an essential first step in solving the problem, I also find it very illuminating as it forces me to read about and understand all the CPU's interrupt sources. This is always a useful step, as in many ways the interrupt sources for a CPU tell you a lot about its capabilities and the designers intent.
  2. Within each interrupt handler, I explicitly mask the interrupt source. This will prevent the interrupt from reoccurring in all but the most extreme of cases.
  3. If necessary, I also clear the interrupt flag. (In some CPU architectures this occurs automatically by vectoring to the interrupt. In others you have to do it manually).
  4. After masking the interrupt source, I then make a call to my trap function. What this means is that while I'm debugging the code, if any unexpected interrupt occurs, then I'll know about it in a hurry. Conversely, of course, with a release build, the trap function compiles down to nothing, essentially removing it from the code.
Here's a code fragment that shows what I mean. In this case it's for an AVR processor and the IAR compiler. However it should be trivial to port this to other architectures / compilers. Note that for the AVR it is in general not necessary to clear the interrupt flag as it is cleared automatically upon vectoring to the ISR.
#pragma vector=INT1_vect /* External Interrupt Request 1 */
__interrupt void int1_isr(void)
{
EIMSK_INT1 = 0;  /* Disable the interrupt */
/* Interrupt flag is cleared automatically */
trap(); 
}
#pragma vector=PCINT0_vect /* Pin Change Interrupt Request 0 */
__interrupt void pcint0_isr(void)
{
PCICR_PCIE0 = 0; /* Disable the interrupt */
/* Interrupt flag is cleared automatically */
trap(); 
}
...
#ifndef NDEBUG
/** Flag to allow us to exit the trap and see who caused the interrupt */
static volatile bool Exit_Trap = false; 
#endif
static inline void trap(void)
{
#ifndef NDEBUG
while (!Exit_Trap)
{
}
#endif
}
Home

Bookmark and Share

Tuesday, April 14, 2009

Effective C Tip #3 - Exiting an intentional trap

This is the third in a series of tips on writing what I call effective C. Today I'd like to give you a useful hint concerning traps. What exactly do I mean by a trap? Well while C++ has a 'built in' exception handler (try searching for 'catch' or 'throw'), C does not (thanks to Uhmmmm for pointing this out). Instead, what I like to do when debugging code is to simply spin in an infinite loop when something unexpected happens. For example consider this code fragment:

switch (foo)
{
case 0:
...
break;

case 1:
...
break;

...

default:
trap();
break;
}

My expectation is that the default case should never be taken. If it is, then I simply call the routine trap(). So what does trap() look like? Well the naive implementation looks something like this:

void trap(void)
{
for(;;)
{
}
}

The idea is that when the system stops responding, stopping the debugger will show that something unexpected happened. However, while this mostly works, it has a number of significant shortcomings. The most important is that leaving code like this in a production release is definitely not a good idea, and so the first modification that needs to be made is to arrange to remove the infinite loop for a release build. This is usually done by defining NDEBUG. The code thus becomes:

void trap(void)
{
#ifndef NDEBUG
for(;;)
{
}
#endif
}

The next problem with this trap function is that it would be ineffective in a system that executes most of its code under interrupt. As a result, it makes sense to disable interrupts when entering the trap. This is of course compiler / platform specific. However it will typically look something like this:

void trap(void)
{
#ifndef NDEBUG
__disable_interrupts();
for(;;)
{
}
#endif
}

The final major problem with this code is that it's hard to tell what caused the trap. While you can of course examine the call stack and work backwards, it's far easier if you instead do something like this:

static volatile bool Exit_Trap = false;

void trap(void)
{
#ifndef NDEBUG
__disable_interrupts();
while (!Exit_Trap)
{
}
#endif
}

What I've done is declare a volatile variable called Exit_Trap and have initialized it to false. Thus when the trap occurs, the code spins in an infinite loop. However by setting Exit_Trap to true, I will cause the loop to be exited and I can then step the debugger and find out where the problem occurred.

Regular readers will perhaps have noticed that this isn't the first time I've used volatile to achieve a useful result.

Incidentally I'm sure that many of you trap errors via the use of the assert macro. I do too - and I plan to write about how I do this at some point.

So does this meet the criteria for an effective C tip? I think so. It's a very effective aid in debugging embedded systems. It's highly portable and it's easy to understand. That's not a bad combination!

Next Effective C Tip
Previous Effective C Tip
Home

Bookmark and Share

Saturday, April 11, 2009

On the use of correct grammer in code comments

Back when I was in college the engineering students were fond of dismissing the liberal arts majors by doing such witty things as writing next to the toilet paper dispenser "Liberal Arts degree - please take one". One of the better retaliatory pieces of graffito that I really liked was: "Four years ago I couldn't spell Engineer - now I are one". I think this appealed to me because there was more than a smidgen of truth in its sentiment. If you don't believe me, just take a look at the comments found in most computer programs. I don't think I'm being exactly controversial by noting that most comments :

  • Lack basic punctuation.
  • Contain numerous spelling errors.
  • Liberally use non-standard abbreviations.
  • Regularly omit verbs and / or other basic components of a sentence.

As a result, many comments are nonsensical. In fact I've been in situations where the comments are so badly written that it's easier to read the code than it is the comments. Clearly this isn't a good thing! When I question programmers about this, I typically get a shrug of the shoulders and a 'what's the big deal' attitude. When pressed further, the honest ones will admit that they couldn't be bothered to use correct grammar or spelling because it's too much effort - and after all you can work out what they mean if you just try hard enough. They are of course correct. However taken to its logical conclusion, this is really an argument for not commenting at all - since with a bit (OK a lot) of effort it should be crystal clear what the code is doing (and why) simply by examining it.

I decided to write about this now since I recently heard from Brad Mosch concerning a pet peeve of his. He gave his permission for me to quote from his email:

I see all the time mixed occurrences of whether or not a space is used for things such as 3dB, 3 dB, 1MHz, 1 MHz, etc. I am hoping that someone in the embedded guru world propose that a space is ALWAYS used between the number and the unit of measure. That is the documented standard that was used in our technical writing at United Space Alliance and NASA. The funny thing is, even though that standard existed out there at Kennedy Space Center, not a whole lot of people knew about it because I saw the same problem in documents out there all the time. Anyway, my point is, isn't "1 Hz" a lot more readable than "1Hz"?

I'm sure many of you may think that Brad is being overly picky. However I don't. His real point (in the last sentence) concerns readability. If you are going to write a comment surely it should be as readable as possible? Now I consider myself a very conscientious commenter of code, and so as a matter of interest I did a quick search on my current project to see whether I was following Brad's advice. Well I was - about 90 % of the time. I found that my style depended a bit on the units. For example, I always appended the % sign without a space, whereas mV, mA etc just about always had a space between the value and the units. You'll be pleased to know Brad that I'll be mending my ways!

Anyway, I'll leave this topic for now. Next time I visit it I'll tell you how I spell check my comments. Hopefully having read this post you'll know why I do it.
Home

Bookmark and Share

Friday, April 03, 2009

Commuting is crazy!

A few posts back I suggested that (American) employers would benefit from giving their engineers a lot more time off. In the comments section, Brad opined that he would very much like to work four 10-hour days. One of the reasons he gave was to avoid the stress and hassle of his daily commute. I agree completely with him. However, I'd like to take this one step further. Why is that (most) employers insist that their staff come to the office each day to work? This always strikes me as ludicrous. Of course there are days where one has to attend meetings, or where you need to use the specialized test equipment that your employer owns. In addition there are many of us who work for employers where secrecy demands that you be at work. However, for the vast majority of engineers there is absolutely no need to be in the office every day. Instead a decent home computer, a broad band connection and a VPN and you are pretty much all set to do exactly what you'd do if you went into the office for the day.

Now notwithstanding that allowing / encouraging / demanding that staff work from home whenever possible has great benefits to the the engineer and the environment, the real key is the boost in productivity that is possible. Any engineer I know will tell you that the best way to get a lot of (hard) work done in a hurry is to shut the door, turn off the telephone and block your email. Maybe it's just me, but that's exactly what can happen when you work from home.

But what about the staff that will go home and slough off for the day? Well I'm sure they exist. I'm also sure that anyone that managed to get through an engineering degree program has enough brains to work out how to goof off at work without being caught if that's their inclination. In short I don't see being at work as evidence that you're actually doing anything useful.

What's maddening about this is when you consider the list of jobs that don't require you to come to the office each day. Examples that spring to mind include sales, truck drivers and home-care health workers. Apparently their employers somehow manage to come up with ways of determining whether they are productive or not.

So what to make of this? I think it's largely inertia. Twenty years ago, the cost of engineering tools was so high that you had to go to work to use them. Today you can set up a well equipped laboratory for $10K. Despite this, the notion of engineers having to go to work persists. If I'm correct, and there aren't any substantive reasons for most of us to go to the office every day, then ultimately logic should overcome the inertia - and working from home several days a week will become the norm. However it won't start changing until more of us start pressuring management to explain why we shouldn't do this.
Home

Bookmark and Share

Monday, March 30, 2009

Efficient C Tips #8 - Use const

One of the easiest ways to make your code more efficient is to use const wherever feasible. Just like declaring local functions as static, this is one of those changes that makes your code more robust, more maintainable and faster - a true win-win situation. So how does this work? Well you get the most benefit when passing pointers as parameters to functions. Here's an example of a function whose job it is to compute the sum of an array of integers. The naive implementation would look something like this:

uint32_t sum(uint16_t *ptr, uint16_t n_elements)
{
uint16_t lpc;
uint32_t sum = 0;

for (lpc = 0; lpc < n_elements; lpc++)
{
sum += *ptr++;
}
return sum;
}

I'll ignore the issues of post increment and counting up (for now). Instead, consider the declaration of ptr. As it stands, the caller of this function has no idea whether sum() will modify the data or not, and hence must assume that it does. This has obvious implications for the compiler when it comes to optimization. To overcome this, it is necessary to declare ptr as pointing to const. The function prototype for sum() now becomes:

uint32_t sum(uint16_t const *ptr, uint16_t n_elements);

You'll notice that I prefer to use what I call Saks notation for where I place the const modifier. The more conventional, albeit less sensible way of writing the declaration is:

uint32_t sum(const uint16_t *ptr, uint16_t n_elements);

Regardless of the style, by doing this you are indicating to the compiler that you will not be modifying the data that ptr points to. As a result, the optimizer can make assumptions that will typically lead to tighter code.

Before I give you the final code, I'd like to make a few other observations.

  • As well as potentially making your code more efficient, use of const also makes your code more readable and maintainable. That is, someone examining your code will know something extra about the function simply by looking at the prototype. Personally I find this very useful.
  • If you examine the C standard library, you'll find very liberal use of the const modifier. You should take this as a strong hint that it's a good idea.
  • PC-Lint will very helpfully tell you if a pointer can be declared as pointing to const. Yet another reason for using Lint!

So what does my sum() function look like? Well, incorporating my previous hints on post increment and counting down, it looks something like this:

uint32_t sum(uint16_t const *ptr, uint16_t n_elements)
{
uint32_t sum = 0;

for (; n_elements != 0; --n_elements)
{
sum += *ptr;
++ptr;
}
return sum;
}

Next Tip
Previous Tip
Home
Editorial Note

I've been following my own advice and have been on a short vacation. As a result I've been tardy in responding to some of your comments. I'll try and rectify this over the next few days.

Bookmark and Share

Sunday, March 22, 2009

Demand more time off!

I've been posting on a lot of technical issues lately and so I thought I'd turn to a less cerebral topic - but one which I feel quite passionate about. First off - some background. I'm British by birth and was raised in Europe (UK & Germany) before moving to the USA in my early twenties. Upon arrival in the USA I was struck by many things; however professionally what amazed me was the number of hours the typical engineer works in the USA compared to their European counterparts. When I left the UK, the standard work week was 37.5 hours and the typical amount of paid time off was 4 weeks for new hires, quickly increasing to 6 weeks or more with length of service. To this was added 8 bank holidays. Perhaps more importantly, employers seemed to think that this was a good thing. For example, my employer at the time had the following policies in effect:

  • Employees were encouraged to work their 37.5 hours in such a way, that the work week ended at lunch time on Friday, effectively ensuring that employees had 2.5 day weekends.
  • Employees were strongly encouraged to take at least 2 weeks off as a block, thus ensuring that they got at least one long break from work every year.

By contrast, when I arrived in the USA, I discovered that the norm was quite different. Indeed the policies I encountered were as follows (and this from the American branch of the same firm as I had worked for in the UK):

  • Work week of 40 hours.
  • Engineers were routinely expected to put in unpaid overtime, with 10 hours being the norm.
  • Annual vacation of two weeks, which only started accruing after 6 months service.
  • Very long serving employees might get 3 weeks vacation a year.
  • Taking more than one week off at a time was actively discouraged.

So what to make of this? If you do the mathematics, a typical engineer in the USA would be working about 50 * 50 = 2500 hours a year (ignoring bank holidays - which are about the same), whereas a typical engineer in the UK would be working 37.5 * 48 = 1800 hours - a 39% difference. Now the question is, did I perceive the engineers in the USA to generate more output? I'd say yes, but only by a few percent, and certainly no where near the 39% more hours that they worked.
I'm sure other people's experience will differ. However it's clear to me why there isn't a big difference in productivity. I solve most of my toughest technical problems when I'm not at work. Indeed, there is nothing like taking a stroll, going for a bike ride, or even sitting down for a beer with friends for clearing the mind and allowing you to literally look at issues from a new perspective. I know this experience isn't unique to me, so why don't employers see the light and realize that everyone benefits from requiring engineers (and other professions - but that's outside my bailiwick) to take more time off?

Maybe it's just me, but a start in changing this situation could be for more engineers to start demanding more time off. Some companies are starting to see the light. For example Netrino offers its employees 5 weeks vacation. Let's make them the norm - not the exception!

As a final note, I know I have regular readers from other parts of the world - South America, Australasia, and the former eastern block. I'd be interested to hear what your working conditions are like.

Bookmark and Share

Sunday, March 15, 2009

Sorting (in) embedded systems

Although countless PhD's have been awarded on sorting algorithms, it's not a topic that seems to come up much in embedded systems (or at least the kind of embedded systems that I work on). Thus it was with some surprise recently that I found myself needing to sort an array of integers. The array wasn't very large (about twenty entries) and I was eager to move on to the real problem at hand and so I just dropped in a call to the standard C routine qsort(). I didn't give it a great deal of thought because I 'knew' that a 'Quick Sort' algorithm is in general fast and well behaved and that with sorting so few entries I wasn't too concerned about it being 'optimal'. Anyway, with the main task at hand solved, on a whim I decided to take another look at qsort(), just to make sure that I wasn't being too cavalier in my approach. Boy did I get a shock! My call to qsort() was increasing my code size by 1500 bytes and it wasn't giving very good sort times either. For those of you programming big systems, this may seem acceptable. In my case, the target processor had 16K of memory and so 1500 bytes was a huge hit.

Surely there had to be a better solution? Well there's always a better solution, but in my case in particular, and for embedded systems in general, what is the optimal sorting algorithm?

Well, after thinking about it for a while, I think the optimal sorting algorithm for embedded systems has these characteristics:
  1. It must sort in place.
  2. The algorithm must not be recursive.
  3. Its best, average and worst case running times should be of similar magnitude.
  4. Its code size should be commensurate with the problem.
  5. Its running time should increase linearly or logarithmically with the number of elements to be sorted.
  6. Its implementation must be 'clean' - i.e. free of breaks and returns in the middle of a loop.
Sort In Place
This is an important criterion not just because it saves memory, but most importantly because it obviates the need for dynamic memory allocation. In general dynamic memory allocation should be avoided in embedded systems because of problems with heap fragmentation and allocation performance. If you aren't aware of this issue, then read this series of articles by Dan Saks on the issue.
Recursion
Recursion is beautiful and solves certain problems amazingly elegantly. However, it's not fast and it can easily lead to problems of stack overflow. As a result, it should never be used in embedded systems.
Running Time Variability
Even the softest of real time systems have some time constraints that need to be met. As a result a function whose execution time varies enormously with the input data can often be problematic. Thus I prefer code whose execution time is nicely bounded.
Code Size
This is often a concern. Suffice to say that the code size should be reasonable for the target system.
Data Size Dependence
Sorting algorithms are usually classified using 'Big O notation' to denote how sensitive they are to the amount of data to be sorted. If N is the number of elements to be sorted, then an algorithm whose running time is N Log N is usually preferred to one whose running time is N2. However, as you shall see, for small N the advantage of the more sophisticated algorithms can be lost by the the overhead of the sophistication.
Clean Implementation
I'm a great proponent of 'clean' code. Thus code where one exits from the middle of a loop isn't as acceptable as code where everything proceeds in an orderly fashion. Although this is a personal preference of mine, it is also codified in for example the MISRA C requirements, to which many embedded systems are built. Anyway to determine the optimal sorting algorithm, I went to the Wikipedia page on sorting algorithms and initially selected the following for comparison to the built in qsort: Comb, Gnome, Selection, Insertion, Shell & Heap sorts. All of these are sort in place algorithms. I originally eschewed the Bubble & Cocktail sorts as they really have nothing to commend them. However, several people posted comments asking that I include them - so I did. As predicted they have nothing to commend them. In all cases, I used the Wikipedia code pretty much as is, optimized for maximum speed. (I recognize that the implementations in Wikipedia may not be optimal - but they are the best I have). For each algorithm, I sorted arrays of 8, 32 & 128 signed integers. In every case I sorted the same random array, together with a sorted array and an inverse sorted array. First the code sizes in bytes:
qsort()  1538
Gnome()  76
Selection() 130 
Insertion() 104
Shell()  242
Comb()  190
Heap()  200
Bubble() 104
Cocktail() 140
Clearly, anything is a lot better than the built in qsort(). However, we are not comparing apples and oranges, because qsort() is a general purpose routine, whereas the others are designed explicitly to sort integers. Leaving aside qsort(), the Gnome sort Insertion sort and Bubble sorts are clearly the code size leaders. Having said that, in most embedded systems, a 100 bytes here or there is irrelevant and so we are free to choose based upon other criteria.

Execution times for the 8 element array

Name  Random  Sorted  Inverse Sorted
qsort()  3004  832  2765
Gnome()  1191  220  2047
Selection() 1120  1120  1120 
Insertion() 544  287  756
Shell()  1233  1029  1425
Comb()  2460  1975  2480
Heap()  1265  1324  1153
Bubble() 875 208 1032
Cocktail() 1682 927 2056
In this case, the Insertion sort is the clear winner. Not only is it dramatically faster in almost all cases, it also has reasonable variability and it has almost the smallest code size. Notice that the bubble sort for all its vaunted simplicity consumes as much code and runs considerably slower. Notice that the Selection sort's running time is completely consistent - and not too bad when compared to other methods.

Execution times for the 32 element array

Name  Random  Sorted  Inverse Sorted
qsort()  23004  3088  19853
Gnome()  17389  892  35395
Selection() 14392  14392  14392 
Insertion() 5588  1179  10324
Shell()  6589  4675  6115
Comb()  10217  8638  10047
Heap()  8449  8607  7413
Bubble() 13664 784 16368
Cocktail() 17657 3807 27634
In this case, the winner isn't so clear cut. Although the insertion sort still performed well, it's showing a very large variation in running time now. By contrast the shell sort has got decent times with small variability. The Gnome, Bubble and Cocktail sorts are showing huge variability in execution times (with a very bad worst case), while the Selection sort shows consistent execution time. On balance, I'd go with the shell sort in most cases.

Execution times for the 128 element array

Name  Random  Sorted  Inverse Sorted
qsort()  120772  28411  77896
Gnome()  316550  3580  577747
Selection() 217420  217420  217420  
Insertion() 88475  4731  158020
Shell()  41661  25611  34707
Comb()  50858  43523  48568
Heap()  46959  49215  43314
Bubble() 231294 3088 262032
Cocktail() 271821 15327 422266
In this case the winner is either the shell sort or the heap sort depending on whether you want raw performance more or less when compared to performance variability. The Gnome, Bubble and Cocktail sorts are hopelessly outclassed. So what to make of all this? Well in any comparison like this there are a myriad of variables that one should take into account, and so I don't believe these data should be treated as gospel. What is clear to me is that:
  1. Being a general purpose routine, qsort() is unlikely to be the optimal solution for an embedded system.
  2. For many embedded applications, a shell sort has a lot to commend it - decent code size, fast running time, well behaved and a clean implementation. Thus if you don't want to bother with this sort of investigation every time you need to sort an array, then a shell sort should be your starting point. It will be for me henceforth.
Home

Bookmark and Share

Thursday, March 05, 2009

Efficient C Tips #7 - Fast loops

Every program at some point requires some set of actions to be taken a fixed number of times. Indeed this is such a common occurrence that we typically code it without really giving it much thought. For example, if I asked you to call a function foo() ten times, I'm sure that most of you would write something like this:

for (uint8_t lpc = 0; lpc < 10; ++lpc)
{
foo();
}

While there is nothing wrong with this, per se, it is sub optimal on just about every processor. Instead you are better off using a construct which counts down to zero. Here are two alternative ways of doing this:

for(uint8_t lpc = 10; lpc != 0; --lpc)
{
foo();
}

uint8_t lpc = 10;
do
{
foo();
} while (--lpc);

Which one you think is more natural is entirely up to you.

So how does this efficiency arise? Well in the count up case, the assembly language generated by the compiler typically looks something like this:

INC lpc ; Increment loop counter
SUB lpc, #10 ; Compare loop counter to 10
BNZ loop ; Branch if loop counter not equal to 10

By contrast, in the count down case the assembly language looks something like this

DEC lpc ; Decrement loop counter
BNZ loop ; Branch if non zero

Evidently, because of the 'specialness' of zero, more efficient code can be generated.

So why don't you see C programs littered with these count down constructs? Well counting down has a major limitation. If you need to use the loop variable as an index into an array then you have a problem. For example, let's say I wanted to zero the elements of an array. Using the count down technique you might be tempted to do this:

uint8_t bar[10];
uint8_t lpc;

do
{
bar[lpc] = 0; // Error! First time through results in index beyond end of array
} while (--lpc);

Evidently it doesn't work. You can of course modify the code to make it work. However doing so typically loses you all the efficiency gains, such that you are better off with a standard up-counting for loop.

As a parting thought, concepts such as these are second nature to assembly language programmers - all of whom do this sort of thing instinctively. As a result, if you are really interested in getting the best out of your C compiler, you could do a lot worse than learning how to program your target processor in assembly language. Does this defeat one of the objectives in programming in a high level language - yes. However, for giving you insight in terms of what is going on under the hood it cannot be beaten.

Next Tip
Previous Tip
Home

Bookmark and Share

Sunday, March 01, 2009

Computing your stack size

Many of the folks that come to this blog by way of search engines do so because they are having problems with stack overflow. I've already given my take on the likely causes of a stack overflow. Today I'd like to offer some hints on a related topic - how to set about computing the stack size for your application. This is an extremely difficult problem, which can be approached in one of three ways - experimentally, analytically or randomly. The latter is by far the most common technique, which consists essentially of choosing a number and seeing whether it works! In an effort to reduce the use of the random approach, I'll try and summarize the other two methods.

Experimentally


In the experimental method, a typically very large stack size is selected. The stack is then filled with an arbitrary bit pattern such as 0xFC. The code is then executed for a 'suitable' amount of time, and then the stack is examined to see how far it has grown. Most people will typically take this number, add a safety margin and call it the required stack size. The main advantage of this approach is that it's easy to do (indeed many good debuggers have this feature built in to them). It also has the advantage of being 'experimental data'. However, there are two big problems with this approach, which will catch the unwary.

The biggest single problem with the experimental approach is the implicit assumption that the experiment that is run is representative of the worst case conditions. What do I mean by the worst case conditions? Well, the maximum stack size occurs in an embedded system when an interrupt that uses the most stack size occurs at a point in the code that the foreground application is also using the maximum stack size. On the assumption that most interrupts are asynchronous to the foreground application, the problem should be clear. How exactly do you know after your testing whether or not the interrupt that uses the most stack size did indeed trigger at the worst (best?) possible moment? Thus even if your testing had 100% code coverage, it still isn't possible to know for sure whether you have covered all possible scenarios. If, as is the normal case, you don't even begin to approach full code coverage, it should be clear to you that testing tends to reveal the typical 'worst-case' condition, rather than the genuine worst case condition.

The second major issue with testing is that it tends to be done when the code is close to being completed, rather than when it is completed. The problem is that small changes in the source code can have a huge impact on the required stack size. For example, let's say that during testing it is discovered that an interrupt service routine is taking so long to complete that another interrupt is being occasionally missed. A 'quick fix' is to simply enable interrupts in the long interrupt handler, so that the other interrupts can do their thing. This one line change can lead to a dramatic increase in stack usage. (If you aren't cognizant of the stack usage of interrupt handlers, you should read this article I wrote).

Analytically


In the analytical approach, the idea is to examine the source code and from the analysis work out the maximum stack usage of the foreground application, and then to add to this the worst case interrupt handler usage. This is obviously a daunting task for anything but the simplest of applications. You will not be surprised to hear that computer programs have been written to perform this analysis. Indeed good quality linkers will now do this for you as a matter of course. Furthermore, my favorite third party tool, PC-Lint from Gimpel, will also now do this starting with version 9. However be warned that it takes a lot of work to set up PC-Lint to perform the analysis.

Although analysis can theoretically give an accurate answer, it does have several problems.

Recursion


It's almost impossible for an analytical approach to compute the stack usage of a program that uses recursion. Indeed it's because of the unbounded effect on stack size that recursion is a really bad idea in embedded systems. Indeed MISRA bans it, and I personally banned it about twenty years ago.

Indirect Function Calls


Pointers to functions are something that I use extensively and heartily recommend (for a discussion see this article I wrote). Although they don't have a deleterious effect on stack size, they do make it quite difficult for analysis programs to track what is going on. Indeed PC-Lint cannot handle pointers to functions when it comes to computing stack usage. Thus if you use an analytical approach and you use pointers to functions, then make absolutely sure that the analysis program can track all the indirect calls.

Optimizers


Code optimization can play havoc with the stack usage. Some optimizations reduce stack usage (by e.g. placing function parameters in registers), while others can increase stack usage. I should note that it's only third party tools that should be bamboozled by the optimizer. The linker that makes up part of the compilation package should be aware of everything that the compiler has done.

Complexity


Even if you have a linker that will compute stack usage, interpreting the output of the linker is always a daunting task. For example, the linker from IAR will compute your stack usage. However, it isn't nice enough to simply say: You need 279 bytes of stack space. Instead you have to study the linker output carefully to glean the requisite information.

A Practical Approach


It's clear from the above that it isn't easy to determine the stack size for an application. So how exactly does one set about this in practice? Well here's what I do.
  1. Locate the stack at the beginning / end of memory (depending upon how the stack grows) and place all variables at the other end of the memory. This essentially means that you are implicitly allowing the maximum amount of memory possible for the stack. Note that many good compilers / linkers will do this automatically for you.
  2. As a starting point, I allocate 10% of the available memory for stack use. If I know I will be using functions that are huge users of the stack (such as printf, scanf and their brethren), then I'll typically set it to 20% of available memory.
  3. I set up the debug environment from day 1 to monitor and report stack usage. This way as I progress through the development process I get a very good feel for the application's stack consumption. This also helps in spotting changes to the code that have big impacts on the required stack size.
  4. Once I have 'all the code written', I start to make use of the information in the linker report. The more tight I am on memory, the closer I examine the linker output. In particular, what I often find is that there is one and only one function call chain that leads to a stack usage that is much greater than all the other call chains. In which case, I look to see if I can restructure that call chain so as to bring the maximum stage usage more in line with the typical stack usage.
If you stumbled upon this blog courtesy of a search engine, then I hope you found the above useful. I invite you to check out some of my other posts, which you may find useful. If you are a regular reader, then as always, thanks for stopping by. Home

Bookmark and Share

Sunday, February 22, 2009

Effective C Tips #2 - Defining buffer sizes

This is the second in a series of tips on writing what I call effective C. Today I'm addressing something that just about every embedded system has - a buffer whose length is a power of two.

In order to make many buffer operations more efficient, it is common practice to make the buffer size a power of two so that simple masking operations may be performed on them, rather than explicit length checks. This is particularly true of communications buffers where data are received under interrupt. As a result, it is common to see code that looks something like this:
#define RX_BUF_SIZE (32)
static uint8_t Rx_Buf[UART_RX_BUF_SIZE];/* Receive buffer */

__interrupt void RX_interrupt(void)
{
static uint8_t RxHead = 0; /* Offset into Rx_Buf[] where next character should be written */
uint8_t rx_char;

rx_char = HW_REG;  /* Get the received character */

RxHead &= RX_BUF_SIZE - 1; /* Mask the offset into the buffer */
Rx_Buf[RxHead] = rx_char; /* Store the received char
++RxHead;   /* Increment offset */
}
The first thing I do to make this code more flexible, is to allow the size of the buffer to be overridden on the command line. Thus my declaration for the buffer size now looks like this:
#ifndef RX_BUF_SIZE
#define RX_BUF_SIZE (32)
#endif
This is a useful extension because it allows me to control the resources used by the code without having to edit the code per se. However, this flexibility comes at a cost. What happens if someone was to inadvertently pass a non power of 2 buffer size on the command line? Well as it stands - disaster. However, the fix is quite easy.
#ifndef RX_BUF_SIZE
#define RX_BUF_SIZE (32)
#endif
#define RX_BUF_MASK  (RX_BUF_SIZE - 1)
#if ( RX_BUF_SIZE & RX_BUF_MASK )
#error Rx buffer size is not a power of 2
#endif
What I've done is define another manifest constant, RX_BUF_MASK to be equal to one less than the buffer size. I then test using a bit-wise AND of the two manifest constants. If the result is non zero, then evidently the buffer size is not a power of two and compilation is halted by use of the #error statement. If you aren't familiar with the #error statement, you'll find this article I wrote a few years back to be helpful.

Although this is evidently a big improvement, it still isn't quite good enough. To see, why, consider what happens if RX_BUF_SIZE is zero. Zero is of course a power of two, and so will pass the check. Now most C90 compliant compilers will complain about declaring an array with zero length. However this is legal in C99 compilers in general and GNU compilers in particular. Thus, we also need to protect against this case. Furthermore as Yevheniy was kind enough to point out in the comments, we also have to protect against a buffer size of 1 (as 1 & 0 = 0). So we now get:
#ifndef RX_BUF_SIZE
#define RX_BUF_SIZE (32)
#endif
#if RX_BUF_SIZE < 2
#error Rx buffer must be a minimum length of 2
#endif
#define RX_BUF_MASK  (RX_BUF_SIZE - 1)
#if ( RX_BUF_SIZE & RX_BUF_MASK )
#error Rx buffer size is not a power of 2
#endif
As a final comment, note that the definition of RX_BUF_MASK has an additional benefit in that it can be used in the mask operation in place of (RX_BUF_SIZE - 1), so that my interrupt handler now becomes:
__interrupt void RX_interrupt(void)
{
static uint8_t RxHead = 0; /* Offset into Rx_Buf[] where next character should be written */
uint8_t rx_char;

rx_char = HW_REG;  /* Get the received character */

RxHead &= RX_BUF_MASK;  /* Mask the offset into the buffer */
Rx_Buf[RxHead] = rx_char; /* Store the received char
++RxHead;   /* Increment offset */
}
So is this effective C? I think so. It's efficient, it's flexible and its robustly protected against the sorts of bone headed mistakes that we all make from time to time. Next Effective C Tip Previous Effective C Tip Home

Bookmark and Share

Wednesday, February 18, 2009

Efficient C Tips #6 - Don't use the ternary operator

I have to confess that I like the ternary operator. K&R obviously liked it, as it is heavily featured in their seminal work. However after running experiments on a wide range of compilers I have concluded that with the optimizer turned on, you are better off with a simple if-else statement. Thus next time you write something like this:

y = (a > b) ? c : d;

be aware that as inelegant as it is in comparison, this will usually compile to better code:

if (a > b)
{
y = c;
}
else
{
y = d;
}

I find this frustrating, as I've consumed 8 lines doing what is more easily and elegantly performed in 1 line.

I can't say that I have any particular insight as to why the ternary operator performs so poorly. Perhaps if there is a compiler writer out there, they could throw some light on the matter?

Next Tip
Previous Tip
Home

Bookmark and Share

Sunday, February 15, 2009

Horner's rule addendum

A few weeks ago I wrote about using Horner's rule to evaluate polynomials. Well today I'm following up on this posting because I made a classic mistake when I implemented it. On the premise that one learns more from one's mistakes than one's successes, I thought I'd share it with you.

First, some background. I had some experimental data on the behavior of a sensor against temperature. I needed to be able to fit a regression curve through the data, and so after some experimentation I settled on a quadratic polynomial fit. This is what the data and the curve looked like:



On the face of it, everything looks OK. However, if you look carefully, you will notice two things:

  • The bulk of the experimental data cover the temperature range of 5 - 48 degrees.
  • There is a very slight hook on the right hand side of the graph

So where's the mistake? Well actually I made two mistakes:

  • I assumed that my experimental data covered the entire expected operating temperature range.
  • I failed to check at run time that the temperature was indeed bounded to the experimental input range.

Why is this important? Well, what happened, was that in some circumstances the sensor would experience temperatures somewhat higher than I expected when the experimental data was gathered, e.g. 55 degrees. Well that doesn't sound too bad - until you take the polynomial and extend it out a bit. This is what it looks like:

You can see that at 55 degrees, the polynomial generates a value which is about the same as at 25 degrees. Needless to say, things didn't work too well!

So what advice can I offer?

  • Ensure that when fitting a polynomial to experimental data, that the experimental data covers all the possible range of values that can be physically realized.
  • Always plot the polynomial to see how it performs outside your range of interest. In particular, if it 'takes off' in a strange manner, then treat it very warily.
  • At run time, ensure that the data that you are feeding into the polynomial is bounded to the range over which the polynomial is known to be valid.

The maddening thing about this for me, was that I 'learned' this lesson about polynomial fits many years ago. I just chose to ignore it this time.

Before I leave this topic, I'd like to offer one other insight. If you search for Horner's rule, you'll find a plethora of articles. The more detailed ones will opine on topics such as evaluation stability, numeric overflow issues and so on. However, it's rare that you'll find this sort of information on polynomial evaluation posted. I think it's because we tend to get wrapped up in the details of the algorithm while losing sight of the underlying mathematics of what is going on. The bottom line, the next time you find a neat algorithm posted on the web for 'solving' your problem, take a big step back and think hard about what is really going on and what are the inherent weaknesses in what you are doing.
Home

Bookmark and Share

Tuesday, February 10, 2009

Effective C Tips #1 - Using vsprintf()

I've been running a series on of tips on Efficient C for a while now. I thought I'd broaden the scope by also offering a series of tips on what I call Effective C. These will be tips that while not necessarily allowing you to write tighter code, will allow you to write better code. I'm kicking the series of on the rarely used standard library function, vsprintf(). First, some preamble...

One of the perverse things I tend to do is look through the C standard library and examine functions that on the face of it seem, well, useless. I do this because I think the folks that worked on this stuff were in general very smart and thus had a very good reason for including some of these 'weird' functions. One of these is the function 'vsprintf'. If you go and look up the definition of this function, e.g. here , then you'll find a rather brain ache inducing description. Now back when I was a lad I'd look at descriptions such as this and simply shrug and walk away. However, about ten years ago I started to make a concerted effort to see if a function such as vsprintf has a real benefit in embedded systems. Here's what I discovered in this case:

If you are working on a product that contains a VFD or LCD, then you will almost certainly have code that contains a function for writing a string to the display at a specified position. For example:


static void display_Write(uint8_t row, uint8_t col, char const * buf)
{
/* Send formatted string to display - hardware dependent*/
}

Then you will also have a plethora of functions that essentially do the same thing. That is accept some data, allocate a buffer on the stack, use sprintf to write formatted data into the buffer, and then call the function that actually writes the buffer to the display at the required position. Here's some examples:

void display_Temperature(float ambient_temperature)
{
char buf[10;

sprintf(buf,"%5.2f", ambient_temperature);
display_Write(6, 8, buf);
}

...

void display_Time(int hours, int minutes, int seconds)
{
char buf [12];

sprintf(buf,"%02d:%02d:%02d", hours, minutes, seconds);
display_Write(3, 9, buf);
}

There's nothing really wrong with this approach. However, there is a better way, courtesy of vsprintf().

What one does is to modify display_Write() to take a variable length argument list. Then within display_Write() use vsprintf() to process the variable length argument list and to generate the requisite string. The basic structure for the function is as follows:

void display_Write(uint8_t row, uint8_t column, char const * format, ...)
{
va_list args;
char buf[MAX_STR_LEN];

va_start(args, format);
vsprintf(buf, format, args); //buf contains the formatted string

/* Send formatted string to display - hardware dependent*/

va_end(args); // Clean up. Do NOT omit
}

My objective here is not to explain how to use variadic arguments or indeed how vsprintf() works - there are dozens of places on the web that will do that. Instead I'm interested in showing you the benefit of this approach. The display_Write() function has evidently become more complex; however the functions that call display_Write have become dramatically simplified, as they are now just:

void display_Temperature(float ambient_temperature)
{
display_Write(6, 8, "%5.2f", ambient_temperature);
}

void display_Time(int hours, int minutes, int seconds)
{
display_Write(3, 9, "%02d:%02d:%02d", hours, minutes, seconds);
}

Is this more Effective code? I think so, for the following reasons.

  • The higher level functions are now much cleaner and easier to follow.
  • All the heavy lifting is localized in one place, which typically dramatically reduces the probability of errors.

Finally, you'll typically end up with a nice reduction in code size (even though this wasn't my objective). All in all, not bad for one obscure function.

Next Tip
Home

Bookmark and Share

Friday, February 06, 2009

Electrical Engineers versus Computer Scientists

Looking back at my various blog postings, I've noticed that although I may be controversial on technical topics, I haven't to date written anything that is controversial on a, shall I say, human side. Well no more Mr. Nice Guy, since today I intend to wade in on the topic of whether Embedded Systems should be programmed by Electrical Engineers or Computer Scientists. Regular readers will know I'm an EE (actually my degree is in EE & ME - but that's another story) and so you won't be surprised to hear that my usual preference is for Electrical Engineers. Although I am a (very) opinionated person, I'd like to think that most of my opinions have some basis in reality, and so here's my opinion and its supporting observations...

The more embedded a product is, the better off you are with an EE, the less embedded it is, the better off you are with a CS.

So what's the basis for this overblown, sweeping generalization and what exactly do I mean by 'more embedded'?

Well, I consider a product to be highly embedded if it meets one or more of the following criteria:

  • It has no or very simple user interfaces.
  • It performs a lot of hardware type functions in software. For example a DSP that performs a lot of signal processing is essentially doing in software what was once done in hardware.
  • It contains a lot of complicated hardware that needs extensive configuration and software support (For example a PowerQUICC processor).

By contrast, I consider a product to be lightly embedded if it meets either of the following criteria:

  • It has a sophisticated user interface (especially if the interface is web based)
  • It is database centric.

Evidently there exists products that meet the criteria for both sides of the dichotomy. For example, my new flat screen TV has a very sophisticated user interface, but I'm sure it does an extensive amount of signal processing.

If you accept this dichotomy, then it is evident that folks working on highly embedded systems really need to understand the hardware (since that's what the product is about) whereas those working on lightly embedded systems need a good understanding of how to build large software systems. Having said this, my experience is that whereas EE's (OK some EE's) are able to quickly learn the principles of building large software systems, I've never yet met a CS major that had anything beyond a casual understanding of what's really happening at the hardware level. I've seen this lack of knowledge (interest?) manifest itself in many ways. Examples include:

  • Not knowing / understanding the Nyquist Sampling theorem
  • Failure to realize that EEPROM / Flash have extraordinarily long write times
  • Not realizing that sampling jitter can destroy the performance of a digital filter

What about the other way? Have I seen EE's write 1000 line functions, and be completely clueless about principles such as data encapsulation? Absolutely! However, I have also seen EE's successfully craft very large systems. As a result I've come to two basic observations:

  • A deeply embedded system written entirely by a CS major will have major problems.
  • A lightly embedded system written entirely by an EE major may have major problems.

On this basis, I prefer (slightly) to have EE's work on embedded systems.

It doesn't take a rocket scientist to conclude that perhaps the best approach is to have a team where the EE's handle the hardware centric stuff and the CS's handle the computer centric stuff. Indeed, this is the approach I see taken in most organizations.

As a final thought, although it is common to find EE majors that have gone back to college to get a Masters in Computer Science, I haven't yet met a CS major that has gone back to college to get a Masters in Electrical Engineering.

Bookmark and Share

Monday, February 02, 2009

First do no harm ...

One of the pleasures of working for myself is that it allows me to experiment with some rather non-traditional approaches to the whole concept of 'work'. In fact, looking back at some of my postings, here, here and here it's clear that this is a recurring theme in my writing. I mention this because a number of years ago I instituted the policy of
Two idiotic mistakes and I quit.
What exactly is this you ask? Well over the years I have noticed that I have days in which rather than progressing on problems, I actually regress, often by huge amounts. I do stupid things such as apply power with the wrong polarity to a board, or I design a circuit that will evidently never work. If I make two of these bone headed mistakes in quick succession, I take it as a clear indicator that my head really isn't where it needs to be - and I quit for the day.

Now, back when I was an employee, I simply had no choice other than to continue 'working', even though I knew full well that I'd be doing my employer a favor if I did nothing more than sit in the corner for the rest of the day. Today, I simply walk away and return to the problem the next day.

It would be an unusual manager who recognized that these days occur - and encouraged his staff to 'quit' when they did. I'm sure for many managers, this concept is too radical. However, if Engineers are indeed professionals, then we could do worse than adopt the abbreviated form of the Hippocratic oath given in the title to this posting.

Home

Bookmark and Share

Tuesday, January 20, 2009

Common programming errors and presidential inaugurations

I don't normally link politics and embedded systems, but something happened today at the inauguration of Barack Obama that struck me as an obvious error, but which my family and I suspect 99.999% of the rest of the viewers accepted without question. I'm referring to the third paragraph of Rick Warren's invocation where he stated:
Now, today, we rejoice not only in America’s peaceful transfer of power for the 44th time. We celebrate a ...

Well it seems to me that if Barack Obama is the 44th president of the USA, then there can only have ever been 43 transitions of power. I suppose that one could claim that when Washington became president, it was a transition of power. However no one could possibly claim it was peaceful!

What's my point? Well Rick Warren had just made a classic programming blunder. I'm guessing that his invocation was scrutinized by an army of political hacks, many with advanced degrees from top universities - yet despite this the error was not caught. I guess next time you make this mistake in your code, you can console yourself with this information.

BTW, you will not be surprised to know that my wife and kids just think that this confirms their belief that I'm a complete Nerd who is in desperate need of a life!

Bookmark and Share

Sunday, January 18, 2009

Using Espresso to simplify embedded systems

In this case, Espresso does not refer to the highly caffeinated drink, but rather to the public domain logic minimization tool. What does this have to do with embedded systems? Well, several months back I was faced with an interesting problem. A product I was working on had nine different alarm outputs (some of which are contradictory), which together were dependent upon about thirty different inputs. Furthermore, the interaction between the various inputs leads to situations where the desired alarm outputs are non obvious, and certainly difficult to determine algorithmically. At this point I realized that what was needed was essentially a giant truth table, where the outputs for any given set of inputs was determined by an expert who could look at the various inputs and determine the optimal alarm strategy.

Thus the question was, how to tackle this problem? This is what we ultimately ended up doing.

First of all the truth table was entered in a database. This was done simply so that we could easily run queries, such as "show me all cases where output 3 is asserted when inputs 6 12 and 13 are negated". This essentially then was the environment in which the human expert worked.

Once the expert was happy with the truth table, it was outputted in CSV format. The CSV file was then pre-processed by a Perl script (thanks Don) and fed to the Espresso logic minimization program. The output of Espresso was then post-processed by the Perl script and converted into compilable C code.

To give you a feel for what the output looks like, here's an excerpt (with the comments removed):

if(((!(inputs[0] & 0x20)) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10) && (!(inputs[3] & 0xa0))) ||
((!(inputs[0] & 0x20)) && (!(inputs[1] & 0x60)) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)) ||
((!(inputs[0] & 0x20)) && ((inputs[2] & 0x4) == 0x4) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)) ||
((!(inputs[0] & 0x20)) && ((inputs[1] & 0x1) == 0x1) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)) ||
((!(inputs[0] & 0x20)) && ((inputs[2] & 0x2) == 0x2) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)) ||
((!(inputs[0] & 0x24)) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)) ||
((!(inputs[0] & 0x28)) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)) ||
((!(inputs[0] & 0x30)) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)) ||
((!(inputs[0] & 0x20)) && (!(inputs[1] & 0x4)) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)))
{
out |= 2048;
}

Evidently, it's enough to make your head spin!

For me, the real benefits of such an approach are as follows:

  • I was able to completely divorce the code from the desired functionality. That is, the functionality of the product was completely driven by the client and was in no way dependent upon me doing anything. Thus, when the client asks me 'what does it do when the following occurs", I can honestly answer "it's whatever you told it to do".
  • By setting this up using a database and a Perl script we recognized that changes to the truth table would inevitably occur, and thus made the process as painless as possible. Now, when a change in functionality is desired, the client simply makes the changes in the database, presses a button to output a new file and I then run a 'make'.
  • The approach is rigorous. We have considered every possible combination of inputs - no matter how unlikely they are to occur. In my experience, this is something that firmware is not very good at.


Although I think this is neat in its own right, I think there are several larger points worth making:

  • Just because a tool was designed ostensibly for one environment (in this case Espresso was really designed for logic minimization in electrical circuits), don't be afraid to use it in other ways.
  • Recognize that certain elements in your design are highly prone to change - and design them with this in mind.
  • Either learn a scripting language or have a scripting expert at your disposal to help build your tool sets. (In my case, I do the latter).
  • If you can divorce your code from the required functionality (i.e. data driven coding), then seriously consider it.


An apology


For all of you that subscribe via RSS, I apologize for the recent blitz of data. I decided to go back through all my postings and add links where appropriate, which seems to have forced the posts to be regenerated.

Home

Bookmark and Share

Sunday, January 11, 2009

Using volatile to achieve persistence!

Once in a while the real world and the arcane world of language standards collide, resulting in surprising results. To see what I mean, read on ...

Many of the products I design incorporate a Bootstrap Loader, so that the application firmware may be updated in the field. In most cases, the bootstrap loader is a completely different program to the main application. Despite this, I find it useful for the main application to pass information to the bootstrap loader and vice versa. Thus the question arises, how best to do this? Well in the processor family I am using, although it is technically possible to store information in Flash, EEPROM or RAM, by far the easiest and most secure way of doing it is to place the information into EEPROM. Furthermore, in order to enter the bootstrap loader it is highly desirable to force a reset of the processor by allowing the watchdog timer to time out.

Thus, the code to enter the bootstrap loader looks something like this:

__eeprom uint8_t msg_for_bootloader;

...

msg_for_bootloader = 0x42;

...

for(;;)
{
/* Wait for watchdog to generate a reset and force entry in to the bootstrap loader */
}


Well, on the face of it, there is not much wrong with this code. However, if one turns on the optimizer, then the compiler examines the code, decides that no code may be executed beyond the infinite loop and thus concludes that the write to msg_for_bootloader is pointless, and promptly optimizes it away. (For a discussion on this topic, see my posting here)

Now you will note that msg_for_bootloader was qualified with __eeprom. This is a compiler extension that allows one to inform the compiler that the variable msg_for_bootloader resides in a special memory space and to be treated accordingly. Now I know that the compiler knows enough about the EEPROM space to generate the correct coding sequences such that reads and writes are performed correctly. However, in my naivete, I also assumed that the compiler knew something about the properties of EEPROM, such that it would realize writing to EEPROM without ostensibly reading it again is intrinsically useful in many applications.

Well it does not. Furthermore, on balance I think the compiler writer's got it right and the error was completely mine.

So what to do? Well, declaring msg_for_bootloader as volatile fixes the problem. Thus my code now looks like this:

__eeprom volatile uint8_t msg_for_bootloader;

...

msg_for_bootloader = 0x42;

...

for(;;)
{
/* Wait for watchdog to generate a reset and force entry in to the bootstrap loader */
}


Thus I ended up in the rather bizarre situation of having to declare a variable as volatile in order to make it persistent!

Although I can appreciate the wry irony of this situation, I think it points to a larger problem. The fact is that we are all (ok, most of us) programming in a language (C) that was not designed for use in embedded systems. Indeed, when C was written, I'm not sure EEPROM even existed. As a result, the compiler vendors have added extensions to the C standard in an effort to overcome its shortcomings for embedded systems, while still desperately striving to achieve "full compliance with the standard". Despite this, I find myself all too frequently falling into traps such as this one. What we really need is a language explicitly designed for embedded systems. It isn't going to happen, but it doesn't stop me wishing for it.

Home

Bookmark and Share

Monday, January 05, 2009

Horner's rule and related thoughts

Recently I was examining some statistical data on the performance of a sensor against temperature. The data were from a number of sensors and I was interested in determining a mathematical model that most closely described the sensors' performance. Using the regression tools built into Excel, I was looking at the various models, from a 'goodness of fit' perspective. After playing around for a while, I came to the conclusion that a quadratic polynomial really was the best fit, and should be the model to adopt. At this point, I turned to the issue of computational efficiency.

Now, it turns out that there is a relatively well known algorithm for evaluating polynomials, called Horner's rule. I say relatively well known, because I'd say about half the time I see a polynomial evaluated, it doesn't use Horner's rule, but instead evaluates the polynomial directly. Thus in an effort to increase the use of Horner's rule, I thought I'd mention it here.

OK, so what is it? Well it's based on simply refactoring a polynomial expression:

anxn + a(n-1)x(n-1) + ... + a0=((anx + a(n-1))x +...)x + a0.


Thus a polynomial of order n, requires exactly n multiplications and n additions.

For example:

23.1x2 - 45.6x + 12.3 = (23.1x -45.6)x + 12.3

In this case a quadratic equation or order 2, using Horner's rule requires 2 multiplications and two additions to evaluate the polynomial, versus the direct approach which requires 5 multiplications and 2 additions.

For those of you that are looking for code to just use, then this snippet will work. This is for a cubic polynomial. COEFFN is the coefficient of xN.

y = x * COEFF3;
y += COEFF2;
y *= x
y += COEFF1;
y *= x
y += COEFF0;

The recurrence relationship for higher order polynomials should be obvious. Note that unlike most implementations, I perform the code in line, rather than using a loop.

It should be noted that as well as being more computationally efficient, Horner's rule is also more accurate. This comes about in two ways:

  • The very act of using less floating point operations leads to less rounding errors
  • Higher order polynomials generate very large numbers in a hurry. Horner's method significantly reduces the magnitude of the intermediate values, thus minimizing problems associated with adding / subtracting floating point numbers that differ in magnitude
Although Horner's rule is a nice tool to have at one's disposal, I think there is a larger point to be made here. Whenever you need to perform any sort of calculation, there is nearly always a superior method than the obvious direct method of evaluation. Sometimes it requires algebraic manipulation such as for Horner's rule. Other times, it's an approximation method, and other times it's just a flat out really neat algorithm (see for example my posting on Crenshaw's square root code). The bottom line. Next time you write code to perform some sort of numerical calculation, take a step back and investigate possibilities other than direct computation. You'll probably be glad you did.

Update

There is a highly relevant addendum to this posting here.

Home

Bookmark and Share

Saturday, December 20, 2008

So you want to be a consultant...

In the lede to this blog, I stated that I'd from time to time be commenting on the trials and tribulations of being a consultant in the embedded systems world. Well, today is my first post on this topic, so I thought I'd address the question I get asked most of the time 'How do you market your business'?

Well, the trite answer is that in general I don't! The bulk of my work comes from repeat clients. I have one client that I've been doing work for for nearly twenty years, another for about seventeen years, and a third for nearly ten years. In short, I'm a very big believer in keeping my existing clients rather than developing new ones all the time. Obviously this isn't very helpful for someone that is thinking about striking out on their own and is wondering how to sign up a client or three.

My main suggestion if this describes you, is to approach previous employers / managers. If you are really good (and it helps a lot if you are) then previous managers will be extremely interested to hear that you are available for consulting work. Why do I say this? Well look at it from their perspective - here is a talented person that knows their products / procedures / tools who is available to come in and help out in overloaded situations. Thus the next time senior management is demanding that something gets done faster, it's an easy sell for your ex-manager to suggest bringing you in to help meet the deadline.

Incidentally, this especially applies to companies that have just had layoffs (even if you were one of those that got cut). When companies have a layoff, they typically overdo it. As a result, important projects grind to a halt and only get moving again when more help is brought in. Now typically for political / legal reasons a company cannot layoff people and then hire different ones. It can however hire 'temporary help' - and that's where you the consultant come in. Thus if you have just been laid off and think it's time to strike out on your own, I strongly suggest that the first person you call to offer your services is the person that laid you off.

Incidentally, I cannot stress enough the importance of face - face or at least voice - voice contact. Sending a card or an email will almost certainly result in the approach going no where. If the thought of 'warm calling' makes you break out in a sweat, then the chances are you just aren't cut out for having your own business.

What about other techniques such as advertising? I have never gone this route but I know people that have with some success. Be warned however that advertising can be expensive and can be too successful. I say this because the only thing worse than not having enough work is having too much!

How important is a good website? Well I used to think it was largely irrelevant (and my website reflects this attitude. I've been promising myself for a year to get it updated). However, I know of several cases where it has been extremely important in bringing in new business. I would caution you though that spending your time and money on a website is no substitute for making the telephone calls.

What about the social networking sites, such as 'Linked In' or 'Plaxo'? These can be helpful if you want to track down all those folks you used to work with who might want to hire you. They are easy to use and low cost / free. Incidentally, don't feel awkward about contacting someone you have lost touch with. Although it might be a little strange socially, it's well worth it to both of you if a fruitful business relationship develops.

Finally, what about the myriad of technical recruiting agencies out there? I have never done any work through them. I have interacted with them, and have found a huge variability in their ethics. Personally, I'd avoid the big companies (which are nothing but key word matchers) and work with the smaller, one man companies. Notwithstanding this, if you're relying on these folks to bring you work then you are being passive rather than proactive. Not recommended!

Next time I post on consulting, I'll address some other important issues. But for now, just remember that a consultant without clients is like a (fill in your own analogy here). Thus the first step in becoming a consultant is getting a client. Only then is the other stuff important.

Follow up to my last post


Thank you to all of you that encouraged others to come and read this blog. I saw a very nice uptick in my readership last week for which I am most grateful.

Home

Bookmark and Share

Saturday, December 13, 2008

Efficient C Tips #5 - Make 'local' functions 'static'

In my humble opinion, one of the biggest mistakes the designers of the 'C' language made, was to make the scope of all functions global by default. In other words, whenever you write a function in 'C', by default any other function in the entire application may call it. To prevent this from happening, you can declare a function as static, thus limiting its scope to typically the module it resides in. Thus a typical declaration looks like this:

static void function_foo(int a)
{
}

Now I'd like to think that the benefits of doing this to code stability are so obvious that everyone would do it as a matter of course. Alas, my experience is that those of us that do this are in a minority. Thus in an effort to persuade more of you to do this, I'd like to give you another reason - it can lead to much more efficient code. To illustrate how this comes about, let's consider a module called adc.c This module contains a number of public functions (i.e. functions designed to be called by the outside world), together with a number of functions that are intended to be called only by functions within adc.c. Our module might look something like this:

void adc_Process(void)
{
  ...
  fna();
  ...
  fnb(3);
}

...

void fna(void)
{
  ...
}

void fnb(uint8_t foo)
{
  ...
}


At compile time, the compiler will treat fna() and fnb() like any other function. Furthermore, the linker may link them 'miles' away from adc_Process(). However, if you declare fna() and fnb() as 'static', then something magical happens. The code would now look like this:


static void fna(void);
static void fnb(uint8_t foo);

void adc_Process(void)
{
  ...
  fna();
  ...
  fnb(3);
}

...

static void fna(void)
{
  ...
}

static void fnb(uint8_t foo)
{
  ...
}


In this case, the compiler will know all the possible callers of fna() and fnb(). With this information to hand, the compiler / linker will potentially do all of the following:

  • Inline the functions, thus avoiding the overhead of a function call.
  • Locate the static functions close to the callers such that a 'short' call or jump may be performed rather than a 'long' call or jump.
  • Look at registers used by the local functions and thus only stack the required scratch registers rather than stacking all of the registers required by the compiler's calling convention

Together these can add up to a significant reduction in code size and a commensurate increase in execution speed.

Thus making all non public functions not only makes for better code quality, it also leads to more compact and faster code. A true win-win situation! Thus if you are not already doing this religiously, I suggest you go through your code and do it now. I guarantee you'll be very pleased with the results.

Next Tip
Previous Tip

A Request ...


If I'm to believe the statistics for this blog, it appears that I'm gradually building a decent sized readership. Furthermore many of you choose to come back and read the latest postings which tells me that I'm doing something of value. Anyway, if this describes you, I'd be obliged if you'd encourage your colleagues to read the blog and also to post comments / questions. Why do I ask this? Well, an increased readership has several benefits, for both me and you the readers.

  • I believe quite passionately about improving the quality of embedded systems. Those of us that are working in this field collectively have an enormous impact on the world. Thus anything that helps improve the quality of embedded systems in turn helps improve the world. (I appreciate that this is a little melodramatic. It is, however, true).
  • Writing about something is the best way to I know to find out if I truly understand it. Thus, the very act of publishing a blog causes me to improve my skills and knowledge.
  • Some of the (too few) comments I get are quite profound and often instructive. Thus I also learn in this way.
  • The bigger the readership I have, the more inclined I am to publish. If I'm publishing things of value, then presumably the readers benefit.

Anyway, if you concur, then please encourage your colleagues. If you don't, then that's OK as well.

Thanks for reading.

Home

Bookmark and Share

Saturday, December 06, 2008

Knowing my weaknesses

A few weeks ago I published what appears to have been quite a popular blog on what I called the 'Bug Cluster Phenomenon'. Today, I'm going to extend that concept somewhat by way of a mea culpa.

Earlier this week I had to eat some very humble pie. For the last six weeks or so I had received complaints that a temperature measurement wasn't giving accurate results. The sensor in question is measuring approximately ambient temperature, and was returning values in the 18 - 26 Celsius range, which seemed reasonable to me. I just wrote off the complaints as being due to the fact that humans have a very poor perception of absolute temperature. Well finally, at my urging, someone dragged the device out into the Winter cold, where it promptly read 18 Celsius. Thus I was faced with proof that something was wrong.

I proceeded to investigate the code, and discovered that based on the current inputs to the code, the code was generating an output with an error of about 2 degrees. How was this possible, since it was nothing more than a series of multiplies, adds and shifts - not typically fodder for a 2 degree error?

Well, further investigation showed that at a certain point I was getting numeric overflow when two numbers were being multiplied together. Now typically, when this occurs, one gets answers that have huge 'errors'. In my case I had the misfortune that the arithmetic worked out such that the error at room temperature was barely noticeable.

Anyway, I duly fixed the code. However, before moving on I took the time to reflect on this particular bug. Was this just one of those stupid coding errors that we all make from time to time, or was there more to it? I came to the conclusion that this was not just "one of those things". Rather I realized that this was at least the third time this year that I had written code that suffered from a numeric overflow problem. In short, I have a problem or a blind spot if you will, for a particular class of problem.

Well I'm told that recognizing ones problems is the first step in solving them. So I proceeded to do a little bit more investigating and discovered that my numeric overflow bugs always occurred when I combined multiple operators on a line. For example:

y = a * a + c;

Thus the solution seems obvious to me - only one numeric operator per line. Thus in future, I will always code like this:

y = a * a;
y += c;


The bottom line. When you encounter a bug, as well as looking for other bugs nearby (as described in the bug cluster phenomenon post), also take the time to reflect on what caused the bug in the first place, and see if you can recognize any systemic problems in your approach to coding. When it comes down to it, this is nothing more than a process of 'continuous quality improvement'. If it works for Toyota then it might just work in the embedded systems arena.

Home

Bookmark and Share

Sunday, November 30, 2008

Modulo Means (reprised)

In my previous post I had asked for some input on how to compute the mean of a phase comparator. Bruno Santiago suggested converting the phase readings to their cartesian co-ordinates and averaging the resulting (X, Y) data, and then converting the means of X & Y back into a phase angle. Well kudos to Bruno because this is exactly what I ended up doing. However, as Bruno observed, it's not exactly an efficient process. It is however robust, and in my application, the robustness counts for a lot.

The suggestion that I average the inputs to the phase comparator has its merits. However for reasons that would take too long to explain, I'm not really able to do this in my application.

Finally, I'd like to mention the second solution that Kyle had proposed. First a caveat. I haven't fully thought through this solution, and I most certainly have not implemented and tested it. With that in mind, here's another approach to contemplate.

You'll remember that we can compute the average of the phase angle by using the simple arithmetic mean, provided that we do not cross back and fore across the zero phase line. Well Kyle's insight was that as well as computing the arithmetic mean of the phase angle, we also do the same for the quadrature angle. The idea is that while it is possible that the phase could alternate across the zero degree line, it would not simultaneously alternate across the 90 degree line (or indeed the 180 degree line). Thus, the method then becomes one of computing two means and choosing the correct one. If I get the time I'll develop this into a fully fledged algorithm and publish it for you all to, ahem, enjoy. I'm fairly sure that this method is not as robust as the cartesian method. However, it is dramatically more efficient and thus is deserving of greater investigation. Bruno - perhaps you'd care to do the analysis in your CFT (Copious Free Time)?

Home

Bookmark and Share

Friday, November 21, 2008

Modulo means

Normally on this blog I'm either giving my opinions on embedded matters, or offering tips on how to do things better. Well today I'm turning the tables, as I'd like your help. Yesterday I ran into a rather perplexing problem, which I'd be interested to see if any of my readers can solve.

In a product I am working on, there is a phase comparator generating difference readings in the range 0 - 0xF. The phase comparator is somewhat noisy and so I want to obtain a moving average of the phase differences. Now typically to perform a moving average filter, one sums the elements in a buffer and divides by the number of elements to obtain the arithmetic mean. Indeed we can do this here, provided that we don't flip back and fore across the zero line. If we do cross the zero line then the method breaks down. For example, if successive phase differences are 0, F, 0, F, 0, F .... 0, F, then the simple arithmetic mean of these numbers will be 8 instead of some value between F and 0.

You may think that the answer is to switch to signed arithmetic and operate over the range -8 ... +7. However, a little thought will show that you have now merely shifted the problem as to what happens when the system is close to -8 such that the values alternate between -8, 7, -8, 7 ... -8, 7.

Thus, can you come up with a robust, efficient solution to compute the mean of an array of modulo numbers?

The problem is solvable as one of the Engineers that I'm working with hit upon not one, but two possible solutions (nice work Kyle). However, I'd be interested in other possible approaches.

I'll publish Kyle's method(s) next week.

Home

Bookmark and Share

Tuesday, November 04, 2008

Dogging your watchdog

Most embedded systems employ watchdog timers. It's not my intention today to talk about why to use watchdog timers, or indeed how to use them. Rather I assume you know the answers to these questions. Instead, I'll pass on some tips for how to track down those unexpected watchdog resets that can occur during the development process.

To help find these problems, it is essential to find out where the watchdog reset is occurring. Unfortunately, this isn't easy, since by definition a watchdog reset will reset the processor, typically destroying all state information that could be used to debug the problem. To get around this problem, here are a few things you can try.
  1. Place a break point on the (watchdog) reset vector. Although this will typically not stop the processor from being reset, it will ensure that none of your variables get initialized by your start up code. As a result, you should be able to use your debugger to examine these variables - which may give you an insight into what is going wrong.
  2. Certain processor architectures allow the action of the watchdog timer to be changed between a classic watchdog (when the timer times out, the processor is reset), to a special form of timer, complete with its own interrupt vector. Although I rarely use this mode of operation in release code, it is very useful for debugging. Simply reconfigure the watchdog to generate an interrupt upon timeout, and place a break point in the watchdog's ISR. Then when the watchdog times out, your debugger will stop at the break point. It's then just a simple matter of stepping out of the ISR to return to the exact point in your code where the watchdog timeout occurred.
  3. If neither of the above methods are available to you, and you are genuinely clueless as to where to start looking, then a painful but workable solution is to 'instrument' entry into each function. This essentially consists of some code that is placed at the start of every function. The code's job is to record the ID of the function into some form of storage that will not be affected by a watchdog reset, such that you can identify the offending function after a watchdog reset has occurred. This isn't quite as bad as it sounds, provided you are good with macros, a scripting language such as Perl and are aware of common compiler vendor extensions such as the macro __FUNCTION__. Of course if you are that good the chances are you won't be clueless as to why you are taking a watchdog reset!
I'll leave it to another post to talk about the sort of code that often causes watchdog timeouts.

Home

Bookmark and Share

Wednesday, October 08, 2008

Bug cluster phenomenon

I was debugging a piece of code recently when I realized that there was a scenario, albeit unlikely, in which a divide by zero could occur. Rather than just fix the bug and move on, I invoked what I call the "bug cluster phenomenon" rule. What you may ask is this rule? Well it has two variants. The first is as follows:

"Where there is one bug, there is usually another". I've observed this phenomenon over many years. What seems to happen is that when I (or anyone else for that matter) is generating a block of code, I get interrupted, or I'm tired or my focus is elsewhere. As a result, when I create one bug, I usually create several others while I am at it. Thus when I find a bug in a function, I always assume that it has company near bye. In short, finding a bug in a function always triggers a top to bottom review of that function and its neighbors. This has dramatically reduced my debugging time over the years - and I strongly recommend you adopt it.

The second variant of the rule is as follows:

"Logical errors normally have company". I've also observed this phenomenon over many years. In this case, it seems that if you have made a particular error in logic in one place in the code, the chances are you have made the same error elsewhere. In the case of the divide by zero issue mentioned in the introduction, this prompted me to wonder if I had any other possible divide by zero errors lurking in my code. As a result, I performed a search through the entire project - and sure enough I found a few other cases where there existed the possibility of a divide by zero error. Thus finding one bug caused me to fix several. That's efficient debugging!

Incidentally, I was able to quickly find all the divisions in my code because I am absolutely anal about having a space on either side of an operator. Thus, I needed to search for only two strings - " / " and " /= ". I've observed that many people are lackadaisical about this, such that you'll often see expressions such as "y=a/b". These people have no option other than to search either for just "/" - which of course returns every line with a comment, or they have to construct a more sophisticated regular expression search - which again takes time and is error prone.

Thus I have three pieces of advice to pass on:
1. When you find a bug, look nearby for more.
2. If the bug was of a particular class of bug, then search your code to see if you had made the same mistake elsewhere.
3. Write your code so that it is trivial to search for certain constructs. It will save you time in the long run.

Home

Bookmark and Share

Monday, September 08, 2008

Efficient C Tips #4 - Use Speed Optimization

Back in July 2008 I promised that the next blog post would be on why you should use speed optimization instead of size optimization. Well four other posts somehow got in the way - for which I apologize. Anyway, onto the post!

In "Efficient C Tips #2" I made the case for always using full optimization on your released code. Back when I was a lad, the conventional wisdom when it came to optimization was to use the following algorithm:

1. Use size optimization by default
2. For those few pieces of code that get executed the most, use speed optimization.

This algorithm was based on the common observation that most code is executed infrequently and so in the grand scheme of things its execution time is irrelevant. Furthermore since memory is constrained and expensive, this code that is rarely executed should consume as little resource (i.e. memory) as possible. On first blush, this approach seems reasonable. However IMHO it was flawed back then and is definitely flawed now. Here is why:

1. In an embedded system, you typically are not sharing memory with other applications (unlike on a general purpose computer). Thus there are no prizes for using less than the available memory. Of course, if by using size optimization you can fit the application into a smaller memory device then use size optimization and use the smaller and cheaper part. However in my experience this rarely happens. Instead typically you have a system that comes with say 32K, 64K or 128K of Flash. If your application consumes 50K with speed optimization and 40K with size optimization, then you'll still be using the 64K part and so size optimization has bought you nothing. Conversely, speed optimization will also cost you nothing - but your code will presumably run faster, and consume less power.

2. In an interesting quirk of optimization technology, it turns out that in some cases speed optimization can result in a smaller image than size optimization! It is almost never the case that the converse is true. See however this article that I wrote which discusses one possible exception. Thus even if you are memory constrained, try speed optimization.

3. Size optimization comes with a potentially very big downside. After a compiler has done all the usual optimizations (constant folding, strength reduction etc), a compiler that is set up to do size optimization will usually perform "common sub-expression elimination". What this consists of is looking at the object code and identifying small blocks of assembly language that are used repeatedly throughout the application. These "common sub-expressions" are converted into sub routines. This process can be repeated ad nauseum such that one subroutine calls another which calls another and so on. As a result an innocuous looking piece of C code can be translated into a call tree that nests many levels deep - and there is the rub. Although this technique can dramatically reduce code size it comes at the price of increasing the call stack depth. Thus code that runs fine in debug mode may well suffer from a call stack overflow when you turn on size optimization. Speed optimization will not do this to you!

4. As I mentioned in "Efficient C Tips #2" one downside of optimization is that it can rearrange instruction sequences such that the special access requirements often needed by watchdogs, EEPROM etc are violated. In my experience, this only happens when one uses size optimization - and never with speed optimization. Note that I don't advocate relying on this; it is however a bonus if you have forgotten to follow the advice I give in "Efficient C Tips #2" for these cases.

The bottom line - speed optimization is superior to size optimization. Now I just have to get the compiler vendors to select speed optimization by default!

Next Tip Previous Tip
Home

Bookmark and Share

Thursday, September 04, 2008

Low cost tools

Like many of you, I subscribe to Jack Ganssle's newsletter (If you don't then you should - go to http://ganssle.com/). In his latest newsletter #164 (alas not yet posted to the web) there is a thread on tools for monitoring serial protocols such as I2C. I was quite interested in this because it so happens I use some of the tools mentioned. What really struck me though was the fact that someone was looking for low cost tools.

I'm always baffled when I see this. If I believe the salary surveys, most engineers in the USA are earning well over $100K. Throw in benefits and your average engineer costs his / her employer about $200K a year, or close to $100 per working hour. Why then do employer's balk at spending a few thousand dollars on a decent tool? I've seen people spend days on compiler problems because they are using a "free" tool; I've had people tell me that they don't use Lint because it's too expensive (<$200!); I've seen people struggle for days simply because their oscilloscope isn't up to the job. In all these cases, the cost in terms of their time dwarfs the equipment / tool cost.

What I want are great tools. I want tools that are intuitive to use, that work really well, are tolerant of my occasional ham-fistedness and that I trust. For example, I have a Fluke 87 multimeter sitting next to me. It costs quadruple what a Radio Shack special costs. It's worth every penny.

Here's an ending thought. You are going in for open heart surgery. The surgeon comes out and says "don't worry - I've got some great low cost tools to use on you". And we wonder why engineers don't get the respect that doctors do.

Home

Bookmark and Share

Tuesday, August 12, 2008

Have you looked at your linker output file recently?

Of all the myriad of files involved in a typical embedded firmware project, probably the two most feared (and yes I do mean feared) are the linker control file (which tells the linker how to link your application) and the linker output file. Today it's the latter which I'll be talking about.

The linker output file tells you a myriad of information about the way your application has been put together. Unfortunately, much of it is in such a cryptic format that examination of the file is a painful process. Indeed, for this reason, I suspect that most projects are completed with nothing more than a cursory look at this file.

This is a shame, because examination of the linker output file can significantly reduce your debugging time. To show you what I mean, consider my typical action sequence when I first start coding up a project.

1. Write a module.
2. Compile module and correct all errors and warnings.
3. Lint module and correct all complaints from Lint.
4. Repeat steps 1, 2 & 3 until I have sufficient modules to be able to generate a linkable image.
5. Link image and repeat steps 1-4 until the linker has no warnings or errors.
6. Examine the linker output file.

I'd wager that most developers out there would be reaching for the debugger in step 6. The reason I do not, is because I can typically find some bugs simply by looking at the linker output. For example, consider this code sequence:


if (0 == var)
{
 function_a();
} else if (1 == var)
{
 function_b();
}
else if (2 == var)
{
 function_b();
{
else
{
 function_d();
}


I make these sort of copy and paste errors all the time. In this case, when var is 2, I meant to call function_c but inadvertently I ended up calling function_b again. Since function_b exists, the compiler is happy and so there are typically no warnings.

So how does looking at the linker output file help me in this case? Well, if you have a decent linker it will give you a list of all the functions that aren't called and that consequently have been stripped out of the final image. If in perusing this list I see that function_c() is listed as uncalled, then I immediately know I've got a bug somewhere. Typically tracking it down is very easy.

I'll leave for another day the other ways I use the linker output file to debug code.

Home

Bookmark and Share

Thursday, August 07, 2008

Improvements versus Features

I'm taking a slight detour from my usual topics to blather about what I see as an unfortunate trend that is making its way from the PC world to the embedded world. My perception is that as more embedded systems get sophisticated user interfaces, the desire to add features seems inescapable. While I don't see adding features as bad, per se, doing so instead of improving the product is a bad thing. What do I mean by improving the product? Well, typically those things that most users don't understand, for example noise floors, power consumption, SNR, software reliability and so on.

In the days before user interfaces, pretty much the only way to improve a product was to work on the "invisible" parameters. Today, it's often far easier to add a new feature than it is to labor at, for example, wringing a few more db of performance out of that digital filter while keeping the number of clock cycles unchanged.

Am I tilting at windmills? I don't think so. Is my plea pointless - probably. However the next time someone comes along asking for a YANF (Yet Another New Feature), do them and you a favor and ask how time spent on the YANF compares to time spent on improving the product.

Home

Bookmark and Share

Friday, August 01, 2008

Efficient C Tips #3 - Avoiding post increment / decrement

It always seems counter intuitive to me, but post increment / decrement operations in C / C++ often result in inefficient code, particularly when de-referencing pointers. For example


for (i = 0, ptr = buffer; i < 8; i++)
{
*ptr++ = i;
}


This code snippet contains two post increment operations. With most compilers, you'll get better code quality by re-writing it like this:


for (i = 0, ptr = buffer; i < 8; ++i)
{
*ptr = i;
++ptr;
}


Why is this you ask? Well, the best explanation I've come across to date is this one on the IAR website:

Certainly taking the time to understand what's going on is worthwhile. However, if it makes your head hurt then just remember to avoid post increment / decrement operations.

Incidentally, you may find that on your particular target it makes no difference. However, this is purely a result of the fact that your target processor directly supports the required addressing modes to make post increments efficient. If you are interested in writing code that is universally efficient, then avoid the use of post increment / decrement.

You may also wonder just how much this saves you. I've run some tests on various compilers / targets and have found that this coding style cuts the object code size down from zero to several percent. I've never seen it increase the code size. More to the point, in loops, using a pre-increment can save you a load / store operation per increment per loop iteration. These can add up to some serious time savings.

Next Tip Previous Tip
Home

Bookmark and Share

Saturday, July 05, 2008

Efficient C Tips #2 - Using the optimizer

In my first post on "Efficient C" I talked about how to use the optimal integer data type to achieve the best possible performance. In this post, I'll talk about using the code optimization settings in your compiler to achieve further performance gains.

I assume that if you are reading this, then you are aware that compilers have optimization settings or switches. Invoking these settings usually has a dramatic effect on the size and speed of the compiled image. Typical results that I have observed over the years is a 40% reduction in code size and a halving of execution time for fully optimized versus non-optimized code. Despite these amazing numbers, I'd say about half of the code that I see (and I see a lot) is released to the field without full optimization turned on. When I ask developers about this, I typically get one of the following explanations:

1. I forgot to turn the optimizer on.
2. The code works fine as is, so why bother optimizing it?
3. When I turned the optimizer on, the code stopped working.

The first answer is symptomatic of a developer that is just careless. I can guarantee that the released code will have a lot of problems!

The second answer on the face of it has some merit. It's the classic "if it aint broke don't fix it" argument. However, notwithstanding that it means that your code will take longer to execute and thus almost certainly consume more energy (see my previous post on "Embedded Systems and the Environment"), it also means that there are potential problems lurking in your code. I address this issue below.

The third answer is of course the most interesting. You have a "perfectly good" piece of code that is functioning just fine, yet when you turn the optimizer on, the code stops working. Whenever this happens, the developer blames the "stupid compiler" and moves on. Well, after having this happen to me a fair number of times over my career, I'd say that the chances that the compiler is to blame are less than 1 in 10. The real culprit is normally the developer's poor understanding of the rules of the programming language and how compilers work.

Typically when a compiler is set up to do no optimization, it generates object code for each line of source code in the order in which the code is encountered and then simply stitches the result together (for the compiler aficionados out there I know it's more involved than this - but it serves my point). As a result, code is executed in the order in which you write it, constants are tested to see if they have changed, variables are stored to memory and then immediately loaded back into registers, invariant code is repeatedly executed within loops, all the registers in the CPU are stacked in an ISR and so on.

Now, when the optimizer is turned on, the optimizer rearranges code execution order, looks for constant expressions, redundant stores, common sub-expressions, unused registers and so on and eliminates everything that it perceives to be unnecessary. And therein dear reader lies the source of most of the problems. What the compiler perceives as unnecessary, the coder thinks is essential - and indeed is relying upon the "unnecessary" code to be executed.

So what's to be done about this? Firstly, you have to understand what the key word volatile means and does. Even if you think you understand volatile, go and read this article I wrote a number of years back for Embedded Systems Programming magazine. I'd say that well over half of the optimization problems out there relate to failure to use volatile correctly.

The second problematic area concerns specialized protective hardware such as watchdogs. In an effort to make inadvertent modification of certain registers less likely, the CPU manufacturers insist upon a certain set of instructions being executed in order within a certain time. An optimizer can often break these specialized sequences. In which case, the best bet is to put the specialized sequences into their own function and then use the appropriate #pragma directive to disable optimization of that function.

Now what to do if you are absolutely sure that you are using volatile appropriately and correctly and that specialized coding sequences have been protected as suggested, yet your code still does not work when the optimizer is turned on? The next thing to look for are software timing sequences, either explicit or implicit. The explicit timing sequences are things such as software delay loops, and are easy to spot. The implicit ones are a bit tougher and typically arise when you are doing something like bit-banging a peripheral, where the instruction cycle time implicitly acts as a setup or hold time for the hardware being addressed.

OK, what if you've checked for software timing and things still don't work? In my experience you are now in to what I'll call the "Suspect Code / Suspect Compiler (SCSC)" environment. With an SCSC problem, the chances are you've written some very complex, convoluted code. With this type of code, two things can happen:

1. You are working in a grey area of the language (i.e. an area where the behavior is not well specified by the standard). Your best defense against this is to use Lint from Gimpel. Lint will find all your questionable coding constructs. Once you have fixed them, you'll probably find your optimization problems have gone away.
2. The optimizer is genuinely getting confused. Although this is regrettable, the real blame may lie with you for writing knarly code. The bottom line in my experience is that optimizers work best on simple code. Of course, if you have written simple code and the optimizer is getting it wrong, then do everyone a favor and report it to the compiler vendor.

In my next post I'll take on the size / speed dichotomy and make the case for using speed rather than size as the "usual" optimization method.

Next Tip Previous Tip

Home

Bookmark and Share

Friday, June 20, 2008

Embedded Systems and the Environment

With the recent run up in the price of oil, it seems as if everyone is talking about energy and how to conserve it. For most people, the only impact they can have on the environment is through their own individual actions and choices. Engineers however, are in a different position because at a professional level, the design choices we make can have a profound effect on the environment. If we believe the figures about the number of embedded processors shipped each year (billions) and we make the very conservative estimate that each processor is in a system that consumes 1 WH per day, then the annual energy consumption of new embedded systems runs to at least 1E9 * 1 * 365 = 365 Tera Watt hours, with an average power consumption of around 41 Megawatts. If we assume that the average life of an embedded system is 5 years, then the embedded systems out there are burning about 200 Megawatts. That's a lot of power folks.

Now here's the interesting thing. Most embedded projects are for products that are made in the thousands. Individually, these products power consumption is irrelevant. Collectively they are huge. Thus if as an industry we made a concerted effort to reduce the power consumption of our products, the benefits to society would be substantial. So how exactly do we do this? Although a lot of the power consumption comes from the hardware design, the firmware design can also have a dramatic impact on the overall power consumption of the system. In my next posting I'll look at some of the ways you can design your system firmware so as to minimize power consumption.

Bookmark and Share

Sunday, June 15, 2008

Efficient C Tips #1 - Choosing the correct integer size

From time to time I write articles for Embedded Systems Design magazine. A number of these articles have concentrated on how to write efficient C for an embedded target. Whenever I write these articles I always get emails from people asking me two questions:

1. How did you learn this stuff?
2. Is there somewhere I can go to learn more?

The answer to the first question is a bit long winded and consists of:
1. I read compiler manuals (yes, I do need a life).
2. I experiment.
3. Whenever I see a strange coding construct, I ask the author why they are doing it that way. From time to time I pick up some gems.
4. I think hard about what the compiler has to do in order to satisfy a particular coding construct. It's really helpful if you know assembly language for this stage.

The answer to the second question is short: No!

To help rectify this, in my copious free time I'll consider putting together a one day course on how to write efficient C for embedded systems. If this is of interest to you then please contact me via my website website soon.

In the interim, I'd like to offer up my first tip on how to choose the correct integer size.

In my experience in writing programs for both embedded systems and computers, I'd say that greater than 95% of all the integers used by those programs could fit into an 8 bit variable. The question is, what sort of integer should one use in order to make the code the most efficient? Most computer programmers who use C will be puzzled by this question. After all the data type 'int' is supposed to be an integer type that is at least 16 bits that represents the natural word length of the target system. Thus, one should simply use the 'int' data type.

In the embedded world, however, such a trite answer will quickly get you into trouble - for at least three reasons.
1. For 8 bit microcontrollers, the natural word length is 8 bits. However you can't represent an 'int' data type in 8 bits and remain C99 compliant. Some compiler manufacturer's eschew C99 compliance and make the 'int' type 8 bits (at least one PIC compiler does this), while others simply say we are compliant and if you are stupid enough to use an 'int' when another data type makes more sense then that's your problem.
2. For some processors there is a difference between the natural word length of the CPU and the natural word length of the (external) memory bus. Thus the optimal integer type can actually depend upon where it is stored.
3. The 'int' data type is signed. Much, indeed most, of the embedded world is unsigned, and those of us that have worked in it for a long time have found that working with unsigned integers is a lot faster and a lot safer than working with signed integers, or even worse a mix of signed and unsigned integers. (I'll make this the subject of another blog post).

Thus the bottom line is that using the 'int' data type can get you into a world of trouble. Most embedded programmers are aware of this, which is why when you look at embedded code, you'll see a veritable maelstrom of user defined data types such as UINT8, INT32, WORD, DWORD etc. Although these should ensure that there is no ambiguity about the data type being used for a particular construct, it still doesn't solve the problem about whether the data type is optimal or not. For example, consider the following simple code fragment for doing something 100 times:

TBD_DATATYPE i;

for (i = 0; i < 100; i++)
{
// Do something 100 times
}
Please ignore all other issues other than what data type should the loop variable 'i' be? Well evidently, it needs to be at least 8 bits wide and so we would appear to have a choice of 8,16,32 or even 64 bits as our underlying data type. Now if you are writing code for a particular CPU then you should know whether it is an 8, 16, 32 or 64 bit CPU and thus you could make your choice based on this factor alone. However, is a 16 bit integer always the best choice for a particular 16 bit CPU? And what about if you are trying to write portable code that is supposed to be used on a plethora of targets? Finally, what exactly do we mean by 'optimal' or 'efficient' code? I wrestled with these problems for many years before finally realizing that the C99 standards committee has solved this problem for us. Quite a few people now know that the C99 standard standardized the naming conventions for specific integer types (int8_t, uint8_t, int16_t etc). What isn't so well known is that they also defined data types which are "minimum width" and also "fastest width". To see if your compiler is C99 compliant, open up stdint.h. If it is compliant, as well as the uint8_t etc data types, you'll also see at least two other sections - minimum width types and fastest minimum width types. An example will help clarify the situation: Fixed width unsigned 8 bit integer: uint8_t Minimum width unsigned 8 bit integer: uint_least8_t Fastest minimum width unsigned 8 bit integer: uint_fast8_t Thus a uint8_t is guaranteed to be exactly 8 bits wide. A uint_least8_t is the smallest integer guaranteed to be at least 8 bits wide. An uint_fast8_t is the fastest integer guaranteed to be at least 8 bits wide. So we can now finally answer our question. If we are trying to consume the minimum amount of data memory, then our TBD_DATATYPE should be uint_least8_t. If we are trying to make our code run as fast as possible then we should use uint_fast8_t. Thus the bottom line is this. If you want to start writing efficient, portable embedded code, the first step you should take is start using the C99 data types 'least' and 'fast'. If your compiler isn't C99 compliant then complain until it is - or change vendors. If you make this change I think you'll be pleasantly surprised at the improvements in code size and speed that you'll achieve. Next Tip

Bookmark and Share

Friday, June 06, 2008

Thoughts on the optimal time to test code

Today I'd like to take on one of the sacred cows of the embedded industry, namely the temporal relationship between coding and testing of the aforementioned code. The conventional wisdom seems to be as follows.

"Write a small piece of code. As soon as possible test the code. Repeat until the task is complete"


I know for many of you, me merely having the temerity to suggest this might be sub-optimal will put me firmly into the category of hopeless heretic. Well, before you write me off as a lunatic, let me tell you about an alternative approach, how I stumbled upon it and why I think it has much to commend it.

Being in the consulting business I'm typically working on multiple projects at once. Often a given project will be put on hold for any number of reasons which aren't germane to this post. As a result, it's not uncommon for me to write some code, compile it and then not touch it again for several months. I then find myself in the position of having to test / debug code that I wrote months ago. Having now done this many times, I've come to the conclusion that rather than this being a problem, it is instead the optimal temporal relationship between coding and testing.

How can this be you ask? Surely after a multi-month hiatus, the code is no longer fresh in your mind and so it must make it that much more difficult to test and debug? Well the answer is of course yes - the code is no longer fresh in my mind, and yes it does make it a little harder to test and debug in the short term. In my emphasis lies the point of my argument.

Why do we write code? Most people would claim we write code in order to make a functional product. I disagree with this assertion. I think we write code so that people coming after us can understand it and modify it. This rather strange claim is based upon those studies that show that companies spend far more money maintaining code than they do writing it. Thus the smart way to write code is to do so in a manner that gives preeminent importance to the long term maintenance of that code. So how does one do this? Well that's a topic for another post. What I can tell you, is that having to test and debug code that you wrote several months ago is a terrific way for the developer of the code to see the code as someone who'll be maintaining it will see it. You'll see the inadequate or plain wrong comments. You'll see the copy and paste errors. You'll see where you got tired and took a short cut, and you'll see those stupid mistakes caused by the telephone ringing at the wrong time.

Indeed because you don't expect the code to work (after all it's never been tested) I find you cast a very jaundiced eye over the code - and in the process find a plethora of the mistakes that one typically finds by sitting in front of a debugger. Maybe it's just me, but I'd rather find bugs via code inspection than by fighting the debug environments common to most embedded systems.

So in a nutshell, I think the optimal way to write and test code is as follows:

1. Write the code. Make sure it compiles and is Lint free.
2. Wait a few months.
3. Reread the code looking for the usual suspects of bad / wrong comments, copy and paste errors, sloppy coding etc.
4. Test it.

The person that maintains your code (quite likely a future version of you) will thank you for doing it this way.

Home

Bookmark and Share

Tuesday, May 20, 2008

visualSTATE

I have been writing this blog now for about 18 months and in reviewing my posts I've noticed that my posts are often critical of technologies, manufacturers and or products. Well today is a first for me, because I'd like to offer my first product endorsement. The endorsement goes to visualSTATE from IAR . I've been using this product for about the same length of time I've had this blog and have concluded that it represents the biggest step forward in productivity for me since I made the move from assembly language to C. (Yes folks, the move from C to C++ was a virtual non-event for me, as I found almost no improvement in my productivity, mainly I suspect because I have written for years in object oriented C).

Anyway, back to the topic of visualSTATE. If you aren't familiar with it, then you should be. It allows you to design complex, hierarchical state machines with ease and to push a button and obtain code that just seems to work. I have now completed three projects using this tool and am well on the way to finishing a fourth. In all cases, the boost to my productivity has been astonishing. I find that I spend most of my time on the functional design and almost no time on debugging the high level application.

visualSTATE's main strengths seem to be in the following areas:

1. Products that are highly modal - i.e. a product can be in one of N operating modes depending upon circumstances..
2. User interfaces. I've had great success with products that contain bespoke LCD and membrane keypads.
3. Products that contain complex sequencing requirements, particularly when coupled with a plethora of failure modes that have to be handled.

I've found the learning curve on visualSTATE to be quite long - but definitely worth it. Although you can certainly be up and running in a day or so, I found that it took me a lot longer to work out how best to partition a problem between visualSTATE and traditional code. However, with experience I'm now finding that I rarely get it wrong anymore.

I've also found some very nice and unexpected benefits from visualSTATE. To wit:

1. Code reuse. visualSTATE does of course require some code support. However, I've found that a lot of this code can be reused. As a result, I can now bring up a new board with a visualSTATE processing engine running on it in a matter of hours. Try doing that with your average RTOS.
2. Although we all know that lots of small functions are "better" than a few big functions, human nature being what it is, we tend to just expand an existing function rather than decomposing it into its constituent parts. Well when using visualSTATE I find that it almost forces one in to writing lots of small (less than 5 lines) functions. I suspect that these small functions are part of the reason that my visualSTATE projects just seem to work with almost no debugging time.
3. Documentation. As well as the documentation benefits associated with small functions (i.e. the comments actually match the code!), visualSTATE comes with a terrific documentation tool. Many of my clients quite rightly demand excellent documentation on the designs I do for them. The documentation engine in visualSTATE makes this a breeze!
4. Communication. My clients often ask questions such as "what does the code do if ...". In a traditional project this usually means pouring through complex code trying to ascertain the answer. With visualSTATE projects I find that most of the time I simply look at the state charts. Since the state charts are effectively the code (since they are tied together), then I can give an answer quickly and authoritatively - which makes my clients happy and helps assure me of future business.

All in all, kudos to IAR for such a great tool.

Home

Bookmark and Share

Sunday, May 11, 2008

Integer Log functions

A few months ago I wrote about a very nifty square root function in Jack Crenshaw's book "Math Toolkit for Real-time Programming". As elegant as the square root function is, it pails in comparison to what Crenshaw calls his 'bitlog' function. This is some code that computes the log (to base 2 of course) of an integer - and does it in amazingly few cycles and with amazing accuracy. The code in the book is for a 32 bit integer; the code I present here is for a 16 bit integer. Although you are of course free to use this code as is, I strongly suggest you buy Crenshaw's book and read about this function. You'll see it truly is a work of art. BTW, one of the things I really like about Crenshaw is that he takes great pains to note that he didn't invent this algorithm. Rather he credits Tom Lehman. Kudos to Lehman.


/**
FUNCTION: bitlog

DESCRIPTION:
Computes 8 * (log(base 2)(x) -1).

PARAMETERS:
- The uint16_t value whose log we desire

RETURNS:
- An approximation to log(x)

NOTES:
-

**/
uint16_t bitlog(uint16_t x)
{
uint8_t b;
uint16_t res;

if (x <= 8) /* Shorten computation for small numbers */
{
res = 2 * x;
}
else
{
b = 15; /* Find the highest non zero bit in the input argument */
while ((b > 2) && ((int16_t)x > 0))
{
--b;
x <<= 1;
}
x &= 0x7000;
x >>= 12;

res = x + 8 * (b - 1);
}

return res;
}


Home

Bookmark and Share

Saturday, April 12, 2008

IEC60730

Atmel has a very interesting application note on IEC60730 Class B compliance. If you aren't aware of IEC60730, there is a nice introduction here. In a nutshell IEC60730 Class B compliance is a safety standard related to household appliances. Part of IEC60730 requires that one actively monitor that a microcontroller (if one is used) is functioning correctly. This seems to be a reasonable thing to do. However, as the Atmel application note shows, meeting this requirement requires one to constantly do things such as test memory, confirm that timers are operating at the correct frequencies and so on. Again conceptually this doesn't seem unreasonable. However, my concern with this is that the very act of confirming that the hardware is functioning could result in a system failure at a critical point, thus creating the very problem the standard is designed to prevent.

For example, it's hard to argue with the contention that the stack is the most used portion of memory in most microcontrollers. I think most engineers would agree that if the memory used for the stack malfunctioned then disastrous things would most likely occur. On this basis, a regular check of the Stack memory would seem to be in order. Maybe it's just me, but the thought of running a memory test on the stack area of a processor while simultaneously trying to respond to interrupts etc seems like a very tall order. Indeed, I can easily envisage a piece of code that is designed to test the stack area malfunctioning and causing a system crash and potentially causing the very thing it's designed to avoid.

I think what it comes down to is this. The reliability of hardware seems to me to be several orders of magnitude better than the reliability of software. Thus using software to validate hardware seems problematic. I'll be very interested to see what happens the first time someone gets hurt as a result of a malfunction in software written to conform to IEC60730. If you don't think this is likely, take a look at the size of the object code produced by Atmel's suggested tests. Then consider that many household appliances use microcontrollers that contain just a few kbytes of object code - and that the IEC60730 code will thus make up a very large fraction of the delivered code. On a simplistic statistical basis, we can assume that if 30% of the code in a product is related to IEC60730 compliance, then 30% of the bugs will be in that code. Given what the code has to do, my money is that the IEC60730 compliance code will have a much higher bug rate than the general application. Thus the probability of a failure occurring in the IEC60730 code is high - and someone will get hurt when the code fails.

As a parting thought, how exactly does one set about testing code that is designed to detect hardware failures internal to an integrated circuit. Although I'm sure I could come up some test protocols for some hardware, I suspect that the Heisenberg uncertainty principle will ensure that the very act of testing the test will result in a flawed test.


Home

Bookmark and Share

Monday, February 04, 2008

The perils of overloading

This post is coming to you from Sweden - a very fine country that I heartily recommend visiting if you get the chance. (If you're wondering why I'm in Sweden - I'm here on business as one of my clients is located in Gothenburg). Anyway, the fact that I'm in Sweden is relevant to this post, as to get here I had to put myself at the mercies of United Airlines. Now the fact that the flight over here was less than perfect wouldn't be news to any of you that travel regularly. However, the reason that the flight was a disaster is relevant, as I'll now try and explain...

Upon arrival at the United check in desk at Dulles airport, I was greeted by an array of self check in kiosks, with a total of one real live human being to take care of baggage check in. Thinking myself to be computer savvy, I negotiated the check in kiosk with ease, only to be told that:
  1. I had to see the human in order to check my bags in, and
  2. The system was unable to assign me a seat and that seat assignment would be done at the gate.
The first instruction was par for the course, while the second instruction I found to be very strange. Anyway, I shrugged my shoulders and went over to the sole person working the desk. There was one gentleman in front of me. This gentleman, not unreasonably asked if he could use some of his frequent flier miles to upgrade to business class. No problem said the United employee, who proceeded to rattle the keys. After 5 minutes, he announced that although the system was showing that seats were available in business class, the computer system refused to allow him to assign a seat. This was the second clue that things were heading south in a hurry. It then took the clerk another 10 minutes to wait list the gentleman (giving a total processing time of 15 minutes). Although it's possible the clerk was incompetent, I got the impression that he really knew what he was doing, and was just being stymied by the system.

Anyway, I checked my bag in and proceeded to the gate. When I got to the gate, I found another 100+ passengers that also had no seat assignments. When eventually I got called to the counter, I found a harried women with a sea of boarding passes printed out in front of her. She was manually searching through them trying to find my name. Eventually she found it and handed it over. My nature being what it is, I politely inquired as to the reason for this astonishingly strange system of assigning seats and issuing boarding passes. Apparently this was the opportunity that the clerk had been waiting for to vent her frustration, as she gladly explained to me that the powers that be had over booked the flight. And so my gentle reader, we come to the point of this post. It was apparent that the United system was unable to handle an overbooked flight correctly, and rather than degrade gracefully, had all but collapsed. At which point I started making some snarky comments to myself about database programmers and how surely all database programmers worked in that field because they couldn't handle the rigors of the embedded / real time world and that any half decent embedded systems person would never make such an elementary mistake. It was then that I had my epiphany. We make the same mistake in the embedded world all the time. When was the last time you used RMA (Rate monotonic analysis) to guarantee that all your tasks would meet their scheduling deadlines? How many failures of embedded systems are caused by overloading (or over scheduling) and the failure to correctly assign task priorities. How many times do weird things happen in your code that you just shrug off as "one of those things"? In short, I found myself cutting a break to the poor sod that wrote United's code. I was still ticked off though!

Home

Bookmark and Share

Sunday, January 27, 2008

A new way to tell if something is an embedded system

Periodically someone tries to come up with a definition of an embedded system. For example there is an excellent and oft cited definition here. What got me thinking about this topic is the latest gadget I love to hate - my Verizon Treo phone running Windows mobile. A few years ago, there would have been no doubt that a cell phone was an embedded system. Today, the Treo, the i-Phone etc are all running versions of traditional computer operating systems, and are much more computer like than they are an embedded system. So the question is what are they - an embedded system or a computer?

Well today I offer a new simple test to tell if these devices are fish or fowl (foul is perhaps more appropriate), to wit:

"Is the device a pain in the neck to use?" If the answer is "yes", then it's a computer. My Treo is a computer. Enough said!

Home

Bookmark and Share

Friday, January 18, 2008

Electronic Component Footprints

As well as writing code and designing hardware, I also do PCB layout. I started doing this after I discovered it was often faster for me to layout a board myself than to try and convey all my requirements to a board layout person. If you've ever done PCB layout, you'll know that getting information about a device's footprint is a real pain. What you may not know is that this is a major source of errors on printed circuit boards, resulting in costly board re-spins and project delays. These errors come about for several reasons.
  1. Getting the information. Many manufacturers include packaging information directly into the parts data sheet. Other manufacturers (TI being a principal offender) instead just cite a packaging part number and say something contrite like "See our website for the latest information". One is then forced into searching a gigantic web site to discover that packaging style WP8 is what the rest of the world calls SO8. I don't mind them decoupling the packaging information from the part data sheet. I just wish they'd get with the program and discover something called Hyper-linking (it's only been around since the 1960s).
  2. Footprints are usually dimensioned as if they were a mechanical part. By this I mean that the drawing is usually rendered like most mechanical parts. Unfortunately, the layout package I use (and I suspect most of the others) treats a footprint as an electrical component. This results in all the pads being on an X-Y grid, with pin 1 usually being at (0,0). What this usually means is that one has to spend time performing a series of elementary trigonometric calculations in order to work out where to place the pads exactly. As you may imagine, this is a major source of error in footprint creation. The frustrating thing for me is that for the mechanical person providing the footprint information, it would be trivial to have their CAD system generate the information in a way that is directly usable.
  3. Many suppliers of mechanical components now offer solid models of their parts on their websites. Typically the models are offered in a number of formats (ProEngineer, Solid Works etc). Thus, if I'm using say a valve from this supplier, I don't have to create the model. I just download it and incorporate it into my working drawing. Why then do suppliers of electronic components not do the same thing for part footprints? I suspect the answer is that no one ever selected a part to use in a design because it made the layout person's job easier.
  4. Lastly, you may be unaware that the footprint for a surface mount part differs depending on whether it is to be reflow soldered or wave-soldered. Some companies (mainly in Europe) supply both footprints. Too many however simply supply the reflow footprint and leave it up to the lowly layout person to try and work out what the footprint should be for wave soldering.
So what's the point of this screed? Well, our industry is all about getting products to market as soon as possible at the lowest possible cost. Component manufacturers could help their customers (which in turn would help them) achieve this goal by simply providing information that removed the footprint bottleneck.

Home

Bookmark and Share

Sunday, January 13, 2008

Omniscient Code Generation

Hi Tech Software has recently been making a lot of noise about its "Omniscient Code Generation". In a nutshell, the technology appears to defer code generation until the entire program has been compiled, and then look at everything before generating the final object code. The end result is a dramatically more compact (and presumably faster running) program image. I haven't had a chance to play with the compiler yet (in part because it's still in beta testing). If they have done what they claim, then Hi Tech should be commended. On my list of things to check out about the technology will be:
  • Is the technology smart enough to track function calls via function pointers? If it is, then this is truly a neat piece of technology. If instead, it's one of the limitations of the product, then its usefulness to me has just plummeted.
  • Does the technology also track function calls from within interrupts? My experience is that interrupt handling is still the poor relation of compiler technology. If Hi Tech does this, then I'll be impressed.
Also of interest to me is how other compiler manufacturers will respond. Keil has performed global register coloring on its 8051 compiler for years. I suspect that the Hi Tech approach is a step beyond this, so there's a chance that Keil will be finally knocked from their #1 position in 8051 code generation. IAR offers a multi unit compilation option with some of its compilers. However, this option isn't integrated into its Embedded Workbench, so it's practically useless. With Hi Tech offering compilers for ARM, PIC & MSP430 I can see this really creating a burst of competition in the industry. Excellent!

Home

Bookmark and Share

Wednesday, August 29, 2007

An unfortunate consequence of a 32-bit world

Back in the bad old days when I was a lad, one learned about microprocessors by programming 8 bit devices in assembly language. In fact I can still remember my first lab assignment - namely to multiply two 8 bit unsigned quantities together to get a 16 bit result (without the use of a hardware multiplier of course). One of the indelible lessons that comes from doing an exercise such as this, is that it can take many instructions to perform even the most innocuous of high level language statements.

I mention this, because today I was looking at some code written by a young engineer who was recommended to me. In examining some of his code, I noticed the following construct:

int ivar;

void some_function(void)
{
...
++ivar;
...
}

interrupt void isr_handler(void)
{
...
--ivar;
...
}


Notwithstanding the fact that ivar should have been declared volatile, the most egregious mistake here was the assumption that the statement ++ivar is an atomic operation. Now if one is used to working on 32 bit machines, the concept of incrementing an integer being anything other than an atomic operation is of course ludicrous. However, in the 8 or 16 bit world where many of us labor in the embedded space, the idea of incrementing an integer being an atomic operation is equally ridiculous. The trouble is with bugs like this is that they are difficult to spot, and will only rear their head after months or even years of operation.

So, is this a case of an incompetent individual? Although nominally yes, I suspect that the real problem is that he was raised on a diet of big CPUs. Perhaps the universities could do these engineers a favor, and throw away the ARM based evaluation boards and replace them with an 8051 based system.

Bookmark and Share

Thursday, August 02, 2007

Application notes code quality

All manufacturers of microcontrollers publish application notes. Some of these application notes are of course nothing more than gussied up advertising drivel. However, many of these application notes contain useful information that can cut days, and sometimes weeks off a project.
Having read hundreds of these application notes over the 25 years I've been doing this, I've come to the conclusion that whereas the application notes usually get the algorithms correct, the same can't be said for the code. Too often the code is sloppy, with bugs that are apparent merely by code inspection. May be it's just me, but whenever I see a sloppy piece of code, it makes me wonder about the underlying quality of the IC design.

I think this is unfortunate, since the manufacturer's could do much to improve things in the industry by setting a great example. To this end, I think they should:
  1. Adopt a set of coding standards that all their code adheres to.
  2. Have the code reviewed, such that egregious bugs are caught.
  3. Make the code Lint free
  4. If they are aiming the product at the automotive industry, ensure it is MISRA C compliant.
The advantages to the IC manufacturer are legion:
  1. They look good (never a bad thing)
  2. All their application note code has the same "look and feel". This encourages engineers to use their application notes, and hence their products.
  3. The code in the application note is usable "as is", speeding time to market and generally giving the perception that their product is easy to use.
  4. Less experienced engineers are taught how to do things correctly - which presumably leads to higher quality products- which presumably translates into more sales.
I guess the thing that I find maddening about this, is that the manufacturers probably spend weeks or months developing the application note, and then let themselves down by presenting their solution in such a poor way. When I talk to the marketing folks for the CPU manufacturers, I make a point of bringing some of the more egregious errors to their attention. Perhaps if all of us did this, we could get a bit of a sea change in the industry.

Home

Bookmark and Share

Friday, July 13, 2007

Comments on code comments

People's opinion on code commenting is a bit like their opinion on speeding (you know the adage - anyone that drives faster than you is a maniac, anyone that drives slower than you is a doddering old fool). With this in mind, I recently got into a bit of a disagreement with a faculty member of one of America's finer engineering schools. Here's a summary of our positions.

Me
I've looked at this 750 SLOC file. It contains no header, no comments, or any other explanation as to what it does. The code itself is non-trivial, involving a large amount of recursion, dynamic memory allocation etc and thus what the code does and how it does it, and indeed why it exists is not obvious to me.

Faculty
Based upon the file name it should be obvious what the code does. If you don't understand the theory of this entity, then you have no business looking at the code. P.s. the code is documented


Home

Bookmark and Share

Wednesday, June 13, 2007

Size matters

Periodically I get printed propaganda from the semiconductor manufacturers touting their latest and greatest ICs. Evidently the marketing folks are convinced that size matters because the size of the IC is almost the first thing they tell you now. A recent example from Maxim has the headline: "Smallest, Most Efficient and Flexible Notebook Fuel-Gauging Solution".

Well size does matter. However, it seems to me that the industry has gone too far. More and more devices are being offered only in chip scale packaging (CSP). As a result, it is all but impossible to hand build a prototype, let alone cobble together a breadboard. The result of this is that in many cases it simply doesn't make economic sense to use the part simply because CSP requires the prototype board to be machine built at a cost of thousands of dollars.

I think the manufacturers are aware of this problem and are trying to address it by offering evaluation boards. While these are OK for the breadboarding phase, they don't solve the prototyping problem. Furthermore even if the project can justify the cost of machine built prototypes, probing the part or (heaven forbid) making modifications to the board is virtually impossible. The bottom line IC manufacturers. Offer all your parts in a package that can be handled by people. Please.

Home

Bookmark and Share

Monday, June 04, 2007

Understanding Stack Overflow

I suspect that many, if not all bloggers are somewhat narcissistic. In my case it shows through in that I use one of the free services that keeps track of how many visitors I get and what brought them to this blog. Well, it turns out that many of the visitors to this blog get here not because of the brilliance of my writing, but because they did a Google search on "stack overflow" often qualified by PIC, or MSP430 etc. For many of these visitors I suspect they leave empty handed. Thus in an attempt to make these visits less pointless, let me give you my take on what causes a stack overflow in an embedded system.

First of all, go read the Wikipedia description of stack overflow. There's nothing wrong with the description - it's just incomplete from an embedded systems perspective.

If you are having problems with 8 bit PICs, then you should read this. For other architectures, read on...

On the assumption that you are getting a stack overflow and that you aren't performing recursion or attempting to allocate a large amount of storage on the stack, what can be going wrong? Here's a check list.
  1. What's your stack size set to? If you don't understand the question then you need an introductory course to embedded systems programming. If you do understand the question - but don't know the answer - then this is the most likely source of your problem. How can this be you ask? Well, most embedded systems compilers are designed to work with a particular family of processors. The low end of the family may have a tiny amount of memory (e.g. 128 bytes). As such setting the default stack size to 16 bytes may be a sensible thing to do. Thus, your first step is to ensure that the stack size is set to something reasonable for your system. Click here for advice on how to do this.
  2. Which stack is overflowing? Many processors / compilers support / implement multiple stacks. A typical dichotomy is a call stack (upon which the return addresses of functions are stored) and a data or parameter stack (upon which automatic variables are stored). If you are using an RTOS, then typically there will be a shared call stack while each thread will have its own data stack. Thus is it the shared call stack that is overflowing, or is it the parameter stack associated with a particular task? Once you've made the determination which stack is overflowing then finding out exactly what gets placed on that stack will help lead you to the solution to your problem. If you can see no obvious high level language construct that is causing the problem, then the single most likely cause of your misery is an interrupt service routine...
  3. An interrupt service routine can use up an extraordinary amount of space on the stack. For a discussion of how this arises and its impact on performance, see this article. This problem is compounded if your system allows interrupts to be nested (that is, it allows an ISR to itself be interrupted).
  4. Certain library functions (printf() and its brethren are prime offenders) can use an enormous amount of stack space.
  5. If you are writing partially in assembly language, are you failing to pop every register that you pushed? This often occurs if you have more than one exit point from a function or ISR.
  6. If you are writing entirely in assembly language, did you set up the stack pointer correctly and do you know which way the stack grows?
  7. Have you made the mistake of programming a microcontroller that you don't understand? For example, low end PIC processors have a tiny call stack which is easily overflowed. If you are programming a PIC and don't know about this limitation, then quite frankly, I'm not surprised you are having problems.
  8. If none of the above solve your problem, then I'm afraid you are most likely in to a stack over-write problem. That is, a pointer is being de-referenced that results in the stack being overwritten. This can often arise when you allocate an array on the stack and then access an element beyond the end of the array. Lint will find a lot of these problems for you. If you don't know what Lint is, see this article. If you do know what Lint is and aren't using it then you deserve to be faced with these sorts of problems.

I have also written a related article on setting your stack size that you may find useful.

Home

Bookmark and Share

Saturday, May 19, 2007

Continued Fractions

Once in a while something happens that makes me realize that techniques that I routinely use are simply not widely known in the embedded world. I had such an epiphany recently concerning continued fractions. If you don't know what these are, then check out this link.

As entertaining as the link is, let me cut to the chase as to why you need to know this technique. In a nutshell, in the embedded world we often need to perform fixed point arithmetic for cost / performance reasons. Although this is not a problem in many cases, what happens when you need to multiply something by say 1.2764? The naive way to do this might be:

uint16_t scale(uint8_t x)
{
uint16_t y;

y = (x * 12764) / 10000;

return y;
}

As written, this will fail because of numeric overflow in the expression (x * 12764). Thus it's necessary to throw in some very expensive casts. E.g.

uint16_t scale(uint8_t x)
{
uint16_t y;

y = ((uint32_t)x * 12764) / 10000;

return y;
}

Our speedy integer arithmetic isn't looking so good now is it?

What we really want to do is to use a fraction (a/b) that is a close approximation to 1.2764 - but (in this case) has a numerator that doesn't exceed 255 (so that we can do the calculation in 16 bit arithmetic).

Enter continued fractions. One of the many uses for this technique is finding fractions (a/b) that are approximations to real numbers. In this case using the calculator here, we get the following results:

Convergents:
1: 1/1 = 1
3: 4/3 = 1.3333333333333333
1: 5/4 = 1.25
1: 9/7 = 1.2857142857142858
1: 14/11 = 1.2727272727272727
1: 23/18 = 1.2777777777777777
1: 37/29 = 1.2758620689655173
1: 60/47 = 1.2765957446808511
1: 97/76 = 1.2763157894736843
1: 157/123 = 1.2764227642276422
2: 411/322 = 1.2763975155279503
3: 1390/1089 = 1.2764003673094582
1: 1801/1411 = 1.2763997165131113
1: 3191/2500 = 1.2764


We get higher accuracy as we go down the list. In this case, I chose the approximation (157 / 123) because it's the highest accuracy fraction that has a numerator less than 255. Thus my code now becomes:

uint16_t scale(uint8_t x)
{
uint16_t y;

y = ((uint16_t)x * 157) / 123;

return y;
}

The error is less than 0.002% - but the calculation speed is dramatically improved because I don't need to resort to 32 bit arithmetic. [On an ATmega88 processor, calling scale() for every value from 0-255 took 148,677 cycles for the naive approach and 53,300 cycles for the continued fraction approach.]

Incidentally, you might be wondering if there are other fractions that give better results than the ones generated by this technique. The mathematicians tell us no.

So there you have it. A nifty technique that once you know about it will make you wonder how you got along without it for all these years.



Home

Bookmark and Share

Tuesday, May 01, 2007

H1-b visas and Economics 101

USA Today has a story today about how 123,000 applications were received within 48 hours of this years H1-b visa lottery being opened on April 1. Given that there are 65,000 visas granted a year, there seems to be a large mismatch between supply and demand. Although the USA Today story talks about some of the sexy positions (Supermodels! Complete with alluring photograph!), the reality is that most of these applications are for the fields of electronics and computing, including embedded systems.

This topic interests me, in part because I came to the USA on a similar visa program (actually an E2 - but that's another story).

Anyway, whenever this topic comes up, there's normally some quote from a high tech industry executive explaining that they simply can't get enough talented folks - and hence the need for the program. Whenever, I see this argument advanced, I'm always struck by the failure of the journalist to ask a basic question - namely "What would you do if the program was eliminated?" I suspect that the honest executive would answer:
  1. Lobby like mad to get it reinstated
  2. Pay what I had to to get the talent I needed
  3. Look to put the work where the talent is (i.e. ship it overseas).
Whereas I could probably discourse for a long time on answer 1, it's the other two that intrigue me.

The reality today is that enrollment in engineering is dropping. If one was to look at non first / second generation immigrant enrollment, I'd hazard a guess that it has all but collapsed. This is despite the fact that engineering in general (and electrical engineering in particular) is always one of the highest paying jobs upon graduation, with recent graduates earning about $65K, versus the $30K earned by your typical liberal arts major. So, what would happen if these salaries doubled? Would this be enough to attract more home grown talent in to the industry? Economics 101 would suggest that if you raise the salaries high enough then supply will rise to meet the demand. The question is, by how much would salaries have to rise?

Economics 101 also suggests that as the price of a good / service rises, it is highly likely that the consumer will look for a substitute. At present this works by bringing folks in on the H1-b program. If the program was eliminated, then I assume that this would be done by shipping more work overseas.

I guess this leads me to the point of my post. The USA prides itself on its capitalist approach - and the belief that the free market is inherently the best way to solve all (OK, most) problems. As a result, Americans normally abhor government interference in the market place. But isn't that exactly what is being done here?

If we genuinely believe in the free market, then the H1-b visa program should be abolished. Salaries would rise for engineers, more students would study engineering - and more work would go overseas. I have no idea whether the end result would be beneficial to engineers or not. It would however be ideologically consistent.

The economic purists might argue that the H1-b visa should be scrapped in the sense that anyone who wished to work here should be allowed to do so. I agree that this is also ideologically consistent. However, the reality is that the USA limits immigration in all fields. Thus to be truly consistent this would require the USA to do the same for all jobs - which is tantamount to saying there are no limits on immigration - something which isn't going to happen.

Home

Bookmark and Share

Saturday, April 21, 2007

Crest factor, Square roots & neat algorithms

I've been programming microcontrollers for about 25 years now - and can count on one hand the number of times I've needed to compute the square root of an integer. This curious drought came to an end recently when I needed to compute the Crest Factor of the line voltage being used to power a product I was designing. (For the uninitiated / rusty out there, Crest Factor is the ratio of the Peak : RMS of a waveform. For example, A sine wave has a CF of 1.414, whereas a square wave has a CF of 1.000).

Why, you might ask, do I need to compute the CF? Well, the product uses triacs to control a number of AC loads. If the system is inadvertently powered from a square wave inverter, or just a really lousy generator, then the triacs will not self-commutate - and I could never turn off the loads. Thus to prevent this unfortunate scenario, I need to know how good (i.e. sinusoidal) the line voltage is. The CF is a direct figure of merit that allows me to make this decision.

Evidently, the computation of CF requires one to compute an RMS voltage, which in turn requires one to calculate the square root of a number. For various reasons, I need to compute the CF on a mains cycle by cycle basis - and I'm using a 7.37 MHz ATmega CPU. Thus, the computational efficiency of the algorithm is important.

Now IAR has a nifty little algorithm that computes an approximate square root. See http://supp.iar.com/Support/?note=18180&from=search+result

However, this gets blown away by the algorithm described by Crenshaw in his wonderful book: Math Toolkit for Real-Time Programming, CMP Books. ISBN 1-929629-09-5.

The code in his book is for computing the square root of a 32 bit unsigned integer. I adapted it to give the square root of a 16 bit integer. Here's the code:

static inline uint8_t friden_sqrt16(uint16_t val)
{
uint16_t rem = 0;
uint16_t root = 0;
uint8_t i;

for(i = 0; i < 8; i++)
{
root <<= 1;
rem = ((rem << 2) + (val >> 14));
val <<= 2;
root++;
if (root <= rem)
{
rem -=root;
root++;
}
else
{
root--;
}
}
return (uint8_t)(root >> 1);
}


This will compute the exact square root of a 16 bit integer in about 268 clock cycles on an AVR - i.e. in about 33 microseconds on an 8 MHz AVR processor.

To Crenshaw's point - don't just blindly use the code, but endeavor to understand how it works. Only then will you see it for what it truly is - a work of art. Thanks Jack.

Home

Bookmark and Share

Saturday, March 31, 2007

Tool Upgrades

As a consultant that does hardware , firmware & software work for my clients, I use a large array of software tools - half a dozen compilers, schematic capture and PCB layout tools, analysis tools as well as the usual gaggle of productivity tools that non-engineers also use. Throw in the tools for running a business and my PC is a regular treasure trove of applications.

With all these tools, the number of upgrades / updates is starting to get out of hand. Every week, it seems I'm updating a major application. The most common scenario seems to be:
  1. I haven't used a tool in a month or so.
  2. I invoke it - and it tells me that an update is available. Often the mandate is 'mandatory' or at least 'recommended'.
  3. I accept the update.
  4. The download proceeds. Some of them are simply enormous (Ever downloaded the Xilinx Webpack IDE?)
  5. The patch then proceeds. The time to execute the patch is often considerable.
  6. Finally - the dreaded 'You must restart your computer' directive. I've a dozen applications open, web pages marked, manuals at strategic places - and now I have to close them all down.
Having gone through all this rigmarole, I can finally start using the tool. Of course by now, I just want to 'get on with it', and so the release notes often get cursory attention. Inevitably, if I do read the release notes then I find the upgrade is completely useless to me (e.g. support for a new device that I'm not using). If I don't read the release notes then of course there's this really neat feature that's been added that really makes life easier - and I don't find out about it until weeks later.

Well - enough complaining. Do I have any suggestions? I think so. I'd like tool vendors to realize that their tool isn't the only one in the box - and that many of us use it on a less than daily basis. With this perspective, I'd like the tool vendors to do the following:
  1. Download upgrades in the background. A lot of applications already do this - they all should.
  2. Inform me there is an update available when I close the tool rather than open it. That way I can allow the update to occur while I'm off doing productive work elsewhere.
  3. Do everything you can to avoid requiring the user to re-boot their computer.
  4. Limit updates to one or two a year. I know product managers want folks on support contracts to feel they are getting their money's worth - but this only works if my life revolves around that tool - and it doesn't!


Home

Bookmark and Share

Thursday, December 14, 2006

Wanted - a new performance metric

In the bad old days, the two major performance concerns in CPU selection were whether a CPU had enough processing power and memory to get the job done. Although these are still issues, it's a rare problem that requires more bandwidth and memory than can be provided by the CPU vendors.

By contrast, today, well over half of the systems I work on are battery powered, and so I find the major question I have when designing an embedded system is 'how long will the battery last?' If you can work this out from studying the data sheets of the various CPU vendors then you're a better engineer than me.

Thus to solve this problem, I propose that we introduce a new performance metric - namely how much energy (Joules) does it take to perform a set of standard tasks. Rather than the usual bunch of quasi meaningful benchmarks, I'd like to see benchmarks such as:

  1. How much energy does it take to receive and transmit one thousand characters through an asynchronous serial port running at 38400 baud?
  2. How much energy does it take to perform a task switch using a standard RTOS such as uCOS-II?
  3. How much energy does it take to perform one thousand A2D conversions?
  4. How much energy does it take to execute a 64 tap FIR filter?

With metrics such as these, the task of choosing the best CPU (and compiler for that matter) would be made much easier. I'm quite prepared to let off the hook those vendors that aren't selling CPUs aimed at the portable market. For the other guys (TI, Atmel, ARM etc) it's time to step up to the wattmeter and be measured.

Home

Bookmark and Share

Friday, December 08, 2006

Wanted - .TEC password

It's time for my first rant - you have been warned!

I recently bought a new computer, complete with a gorgeous 24" flat panel display. The flat panel supports a speaker bar - which I also bought. The installation instructions for the speaker bar are quite straightforward - align the tabs on the bar with the holes in the display, and push until the bar clicks in to place.

Well, on my system, there's no click. The display seems to lack the spring loaded latch necessary for this to work.

I have now had four email exchanges with 'technical support'. The first didn't read what I wrote, the second told me that this was a big issue and would take several days to resolve, the third did a keyword search on 'speaker bar' and sent me a bunch of useless links, and the fourth decided that my problem was that I didn't understand the installation instructions - and so sent me another copy of them.

In short, I've been treated like a moron.

I suspect that some / many / most people that contact technical support lack, ahem, technical acumen. Well, if you are reading this blog, the chances are you are not such a person. I also suspect that you've had a similar experience - which got me to thinking. What I need is a .TEC password. Just as Microsoft's .NET password lets you manage your net identities, a .TEC password would tell the recipient that they are dealing with someone who really can, at the very least, align two tabs with their mating holes and push - and so should be treated accordingly.

Thanks for listening.

Home

Bookmark and Share

Monday, December 04, 2006

RIP VOIP

As someone that has worked in telecomms, I was excited by the arrival of VOIP. However, after two years of variable quality, extended outages and just plain weird behaviour I've had it. It's clear to me that VOIP just isn't ready for prime time and so I have decided to pull the plug. The latest frustration - an inability to receive incoming calls for the last four days - with no resolution in sight. The technical support department informs me that it's a 'router programming error'. Whether they really mean a router configuration error, or a bug in the router firmware is unclear. Regardless, it's presumably a tough enough problem that it can't be fixed in four days.

The really bad news here is my experience when I tried to get Verizon to provide me with a POTS line. One of my prime reasons for jumping on VOIP as soon as I could was my feeling that Verizon was a dreadful company - one with questionable ethics and really awful customer service. Today, despite calling the number on the Verizon website for 'add a new line', I had to endure a voice prompted menu system and three different people before I could do the most mundane thing Verizon has to offer - order telephone service. For this privilege, Verizon is charging me a $44 start up fee (to plug a few numbers into a computer) and a cost double that offered by my VOIP provider. Apparently Verizon has not had its business suffer enough - yet.

So what's the relevance of this tail of woe to embedded systems? Not much really, other than to note that when the latest and greatest doesn't live up to its billing - one ends up with very annoyed customers. So next time marketing wants to over-hype what you can deliver, rein them in hard and fast. Your customers will thank you.

Home

Bookmark and Share