Saturday, June 20, 2009

Thoughts on BCC's, LRC's, CRC's and being experienced

Those of us that have been working in this field for a long time are referred to as 'experienced'. Experienced is taken to mean that we have been doing this for long enough that we have experienced many of the problems common to embedded systems and thus know how to solve them. Although this is true for many things, I think there is a downside to it - namely that because we've successfully solved a particular problem a number of times that we fall into the trap of thinking that our solution is optimal. In order to guard against this it is essential to be proactive in seeking out new solutions to old problems. To illustrate my point, I'll take you on an abbreviated trip through the memory lane of my career when it comes to that most prosaic of problems - transmitting serial data between microcontrollers.

Back when I was a lad I was by definition naive and so I just transmitted the data without any thought to how to detect errors beyond the use of a parity bit on each byte. Well it didn't take me long to work out that a simple parity bit wasn't exactly a robust way of detecting errors, and so I started appending a simple additive checksum to the message.

Well that worked for a while until the day it dawned on me that an additive checksum without an initial seed value was vulnerable to a stuck channel (e.g. all zeros). From that day on I started seeding my checksum computations with initial values. I tended to favour 0x2B (with apologies to Hamlet).

Somewhere along the road I switched from perfoming an additive checksum to using an XOR operation. I can't remember why I did this - but it just seemed 'better'.

This approach served me well for many years until I started investigating cyclic redundancy checks (CRC). I'd known about CRC's for a long time of course. However all the ones I knew about used 16 or 32 bit values and had certain wondrous but rather unspecified properties for detecting certain classes of errors. To put it bluntly they seemed like complete overkill for sending a short message between two microprocessors - and so I didn't entertain them. However this all changed the day I came across an 8 bit CRC. This changed my perspective dramatically. An 8 bit CRC designed for protecting small messages - excellent! Thus henceforth I eschewed the use of an LRC and instead opted for an 8 bit CRC to protect my messages.

Well this continued for a number of years. I learned more about CRCs, I got older until one day I decided to ask myself the question - is the 8 bit CRC I am using optimal? For regular readers of this blog, you'll probably have noticed that 'optimal solutions' is a recurring theme. Anyway, with this thought in mind, I set off on a hunt to determine whether in fact the 8 bit CRC I was using to protect small messages was indeed optimal. That's when I came across this paper by Koopman and Chakravarty. It's entitled 'Cyclic Redundancy Code (CRC) Polynomial Selection for Embedded Networks'. It's a highly readable and informative paper. They essentially investigate what constitutes 'optimal' for a CRC polynomial and then exhaustively explore optimal polynomials for different data lengths and different polynomial lengths. Most interestingly they slay some sacred cows along the way, including the popular CRC-8 polynomial (x8+x7+x6+x4+x1+1).

Having read the paper, I discovered that the CRC I was using (the so called ATM-8 polynomial(x8+x2+x1+1)) wasn't bad for my application - but it wasn't optimal. Upon reflection this was hardly surprising since I had essentially selected it on the basis that it was designed for a similar application to mine - and thus must be decent. However as Koopman shows - this can be a very foolhardy assumption. I just got lucky.

More importantly from my perspective is that using Koopman's paper I now have a logical methodology for determining the optimal CRC for any application. Thus after close to 30 years of doing this I think I'm finally homing in on the truly optimal solution to this problem.

Of course, the larger lesson to be learned here is that just because you have done something a certain way for many years means nothing unless you know that it is the optimal way of doing it. That's when you are truly 'experienced'.

Bookmark and Share

Wednesday, June 17, 2009

Do I have the technical skills to be a consultant?

My previous post on being a consultant addressed the issue of how to market yourself. Today I'll look at something a little more prosaic - how can you tell if you have the necessary technical skills to be a consultant? This post was motivated by an email I received from Victor Johns who basically asked the aforementioned question.

Before I answer this question, I should note that while technical skills are essential to being a successful consultant, they are by no means sufficient. I'll leave it to another day to discuss the sales and business skills required to run a consulting business.

Anyway - on to the answer. Well my first and rather sardonic observation is that you don't need to be technically competent at all. Just about every engineer I have ever met has unfortunately experienced the case of the clueless consultant - that is someone that does more harm than good. While these individuals do of course exist, they are by no means 'successful' as they have to spend an inordinate amount of time winning new business as no one ever hires them a second time.

If we ignore the aforementioned clueless consultant, then I think my answer depends a bit on what sort of consultant do you want to be? Some consultants are specialists and others are generalists. If you are a specialist, then essentially you are marketing yourself as the 'go to guy' in a narrow field. A good example might be Bluetooth. If you are promoting yourself as a Bluetooth expert then you had better know pretty much all there is to know about Bluetooth. However, what about the majority of consultants who are more generalists? In their case absolute knowledge is not as important as the ability to learn fast and to apply skills learned in one field to the field they are currently in. The reason I say this is because no sensible client will expect you to know 'everything' needed to do a particular job. Rather they expect that you have the fundamental skills upon which you can rapidly build in order to solve the problem. It's for this reason that my ideal project is one with 30% 'new stuff'. That is I know exactly how to do 70% of the project, whereas the remaining 30% will require me to learn new tools / skills.

This of course brings up the issue of how does one stay up to date? While there are many ways of doing this, I find textbooks to offer the best bang for the buck. Simply put, a $100 text book that saves me an hour on a project is a good investment. One that saves me a day is an outstanding investment. It's for this reason that I have a stellar technical library.

As a parting comment I'll note that we have all run into the occasional engineer who 'knows' they know it all - while actually being pedestrian. In my experience it's the engineers that have a lot of confidence in their ability - but still realize that they can't hope to 'know it all' that ultimately will succeed in this business. I'm talking about you Victor!

Bookmark and Share

Wednesday, June 10, 2009

Three byte integers

One of the enduring myths about the C langauge is that it is good for use on embedded systems. I've always been puzzled by this. While it is true that many other languages are dreadful for use on embedded systems, this merely means that C is less dreadful rather than 'good'. While I have a host of issues with C, the one that constantly galls me is the lack of 3 byte integers. Using C99 notation these would be the uint24_t and int24_t data types. Now a quick web search indicates that there may be the odd compiler out there that supports 3 byte integers - but the vast majority do not.

So why exactly do I want a 3 byte integer? Well, there are two main reasons:

Intermediate results


When I look through my code, I find a huge number of incidences where I am performing an arithmetic operation on a 16 bit value, where intermediate values overflow 16 bits, yet the final value is 16 bits. For example:

uint16_t a, b;

a = (b * 51) / 64;

In this case, the code will fail if (b * 51) overflows 16 bits. As a result, I am forced to write:

a = ((uint32_t)b * 51) / 64;

However, examination of this code shows that (b * 51) could never overflow 24 bits for all 16 bit b. Thus I'd much rather write:

a = ((uint24_t)b * 51) / 64;

Now obviously on a 32 bit processor there would be zero benefit to doing this (indeed there may be a penalty). However on an 8 bit (and probably a 16 bit) processor, there would be a dramatic benefit to such a construct.

Real world values


I regularly find myself needing a static variable that requires more than 16 bits of range. However when I look at these variables they almost never require the staggering range of a 32 bit variable. Instead 24 bits would do very nicely. Needless to say I am forced to allocate 32 bits even though I know that the most significant byte will never take on anything other than zero. This is particularly galling when these variables are stored in EEPROM - with its associated cost and long write times.

Taking these two together across all the 8/16 bit embedded systems out there the cost in wasted instruction cycles, memory, stack size and energy must be truly staggering. We could probably save a power plant or two world wide with all the energy being wasted!

So why don't most compiler vendors support a 24 bit integer? I don't know for sure, but I suspect it is some combination of:

  • No one has been asking for it.
  • They are more concerned with being C89 / C99 compliant than they are with being useful.
  • No one has ever implemented a compiler benchmark where support for a 3 byte integer would be useful.

If you happen to agree with me that a 3 byte integer would be very useful, then next time you see your friendly compiler vendor - complain (or at least point them to this blog). Who knows, change may yet come!

Bookmark and Share

Friday, June 05, 2009

Division of integers by constants

An issue that comes up frequently in embedded systems is division of an integer by a constant. Of course most of the time we try and arrange things such that the divisor is a power of two such that the division may be performed by shift operations. However, all too often we have to divide an integer by some non power of two value. Divisors that seem to crop up a lot are 10 & 100 (for obvious reasons), 3 (for no good reason), 60 (when dealing with time) and of course various combination's of pi and root 2. In cases like these you can of course just code it 'normally' and let the compiler do the work for you. However, when you feel the need for speed, there are other techniques that are spectacularly good.

I learned about this subject in dribs and drabs over the years without ever coming across a good summary - until I located this paper by Douglas Jones (no relationship). It does a nice job of explaining most of what you need to know in order to perform division of an integer by a constant. I particularly like the fact that he has algorithms for CPUs that contain barrel shifters - and those that do not. I strongly recommend that you read the paper. One note of caution however - Jones like many academics is used to working on CPUs with 32 bit word lengths. As such, his code assumes that integers are 32 bits. If you use his code as is, then it will fail on 16 bit word length machines. It's for reasons such as this that I really recommend everyone would use the C99 data types.

For those of you too lazy to read the paper, its basic premise is based upon the fact that division by a constant is equivalent to multiplication by the reciprocal of that constant. There is nothing of course earth shattering about this observation. However, Jones then goes ahead and explains about binary points, rounding etc in order to achieve the desired result. Since I had to reduce his paper to practice, I thought I'd go ahead and share the 'recipe' with you. Before doing so I should note that I work mostly with 8 & 16 bit CPUs that do not contain barrel shifters. As a result I am most interested in the techniques that use multiplication. If you are working with a 32 bit processor with a barrel shifter and an instruction cache then you should seriously look at his other implementations.

Division of a uint16_t by a constant K


In the steps that follow, there is no requirement that K be integer. It must however be greater than 1.
There are two recipes. The first works for many divisors - but not all and is the faster of the two. The second recipe will give better results for all inputs - but produces less efficient code. While I am sure that there is some analytical way of making the determination ahead of time, I've found it easier to use the first recipe and exhaustively test it. If it works - great. If not then switch to the second recipe.

In the following descriptions, Q is the quotient (i.e. the result) of dividing an unsigned integer A by the constant K.

Recipe #1


  1. Convert 1 / K into binary. There is a nice web based calculator here that will do the job.
  2. Take all the bits to the right of the binary point, and left shift them until the bit to the right of the binary point is 1. Record the required number of shifts S.
  3. Take the most significant 17 bits and add 1 and then truncate to 16 bits. This effectively rounds the result.
  4. Express the remaining 16 bits to the right of the binary point as a 4 digit hexadecimal number M of the form hhhh.
  5. Q = (((uint32_t)A * (uint32_t)M) >> 16) >> S
  6. Perform an exhaustive check for all A & Q. If necessary adjust M or try recipe #2.

Incidentally, you may be wondering why I don't use the form espoused by Jones, namely:Q = (((uint32_t)A * (uint32_t)M) >> (16 + S))
The answer is that this requires a left shift 16 + S places of a 32 bit integer. By splitting the shift into two as shown and by making use of the C integer promotion rules, the expression becomes:

  1. Right shift a 32 bit integer 16 places and convert to a 16 bit integer. This effectively means just use the top half of the 32 bit integer.
  2. Right shift the 16 bit integer S places.

This is dramatically more efficient on an 8 or 16 bit processor. On a 32 bit processor it probably is not.

Recipe #2



  1. Convert 1 / K into binary.
  2. Take all the bits to the right of the binary point, and left shift them until the bit to the right of the binary point is 1. Record the required number of shifts S.
  3. Take the most significant 18 bits and add 1 and then truncate to 17 bits. This effectively rounds the result.
  4. Express the 17 bit result as 1hhhh. Denote the hhhh portion as M
  5. Q = ((((uint32_t)A * (uint32_t)M) >> 16) + A) >> 1) >> S;
  6. Perform an exhaustive check for all A & Q. If necessary adjust M.

Again I split the shifts up as shown for efficiency on an 8 / 16 bit machine.

Example 1 - Divide by 30


In this case I wish to divide a uint16_t by 30.

  1. Convert to binary. 1 / 30 = 0.000010001000100010001000100010001000100010001000100010001
  2. Left shift until there is a 1 to the right of the binary point. In this case it requires 4 shifts and we get 0.10001000100010001000100010001000100010001000100010001. S is thus 4.
  3. Take the most significant 17 bits: 1000 1000 1000 1000 1
  4. Add 1: giving 1000 1000 1000 1000 1 + 1 = 1000 1000 1000 1001 0
  5. Truncate to 16 bits: 1000 1000 1000 1001
  6. Express in hexadecimal: M = 0x8889
  7. Q = (((uint32_t)A * (uint32_t)0x8889) >> 16) >> 4

An exhaustive check confirms that this expression does indeed do the job for all 16 bit values of A. It is also about 10 times faster than the compiler division routine on an AVR processor.

Example 2 - Divide by 100


In this case I wish to divide a uint16_t by 100. This is one of those cases where we need 17 bit resolution

  1. Convert to binary. 1 / 100 = 0.00000010100011110101110000101000111101011100001010001111011
  2. Left shift until there is a 1 to the right of the binary point. In this case it requires 6 shifts and we get 0.10100011110101110000101000111101011100001010001111011. S is thus 6.
  3. Take the most significant 18 bits: 1 0100 0111 1010 1110 0
  4. Add 1: 1 0100 0111 1010 1110 0 + 1 = 1 0100 0111 1010 1110 1
  5. Truncate to 17 bits: 1 0100 0111 1010 1110
  6. Express in hexadecimal: M = 1 47AE
  7. Q = ((((uint32_t)A * (uint32_t)0x47AE) >> 16) + A) >> 1) >> 6;

An exhaustive check shows that the division is not exact for all A. I thus incremented M to 0x47AF and got exact results for all A. This code was about twice as fast as the compiler division routine on an AVR processor.

Example 3 - Divide by π


This is an example where the resultant expression results in an approximate result. The approximation is very good though, with a quotient that is off by at most 1 for all A.

  1. Convert to binary: 1 / π = 0.010100010111110011000001101101110010011100100010001001
  2. Left shift until there is a 1 to the right of the binary point. In this case it requires 1 shift and we get
    10100010111110011000001101101110010011100100010001001. S is thus 1.
  3. Take the most significant 18 bits: 1 0100 0101 1111 0011 0
  4. Add 1: 1 0100 0101 1111 0011 0 + 1 = 1 0100 0101 1111 0011 1
  5. Truncate to 17 bits: 1 0100 0101 1111 0011
  6. Express in hexadecimal: M = 1 45F3
  7. Q = ((((uint32_t)A * (uint32_t)0x45F3) >> 16) + A) >> 1) >> 1;

An exhaustive check that compared the result of this expression to (float)A * 0.31830988618379067153776752674503f showed that the match was exact for all but 263 values in the range 0 - 0xFFFF. Where there was a mismatch it is off by at most 1. It's also 23 times faster than converting to floating point. Not a bad trade off.

Example 4 - Divide by 10 on an 8 bit value


This technique is obviously usable on 8 bit values. One just has to adjust the number of bits. Here's an example

  1. Convert to binary. 1 / 10 = 0.0001100110011001100110011001100110011001100110011001101
  2. Left shift until there is a 1 to the right of the binary point. In this case it requires 3 shifts and we get 0.1100110011001100110011001100110011001100110011001101. S is thus 3.
  3. Take the most significant 9 bits: 1100 1100 1
  4. Add 1: giving 110011001 + 1 = 110011010
  5. Truncate to 8 bits: 1100 1101
  6. Express in hexadecimal: M = 0xCD
  7. Q = (((uint16_t)A * (uint16_t)0xCD) >> 8) >> 3

An exhaustive check confirms that this expression does indeed do the job for all 8 bit values of A. It is also about 8 times faster than the compiler division routine on an AVR processor.

Summary


Using the values generated by Jones, together with some of the values I have computed, here's a summary of some common divisors for unsigned 16 bit integers.

Divide by 3: (((uint32_t)A * (uint32_t)0xAAAB) >> 16) >> 1
Divide by 5: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 2
Divide by 6: (((uint32_t)A * (uint32_t)0xAAAB) >> 16) >> 2
Divide by 7: ((((uint32_t)A * (uint32_t)0x2493) >> 16) + A) >> 1) >> 2
Divide by 9: (((uint32_t)A * (uint32_t)0xE38F) >> 16) >> 3
Divide by 10: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 3
Divide by 11: (((uint32_t)A * (uint32_t)0xBA2F) >> 16) >> 3
Divide by 12: (((uint32_t)A * (uint32_t)0xAAAB) >> 16) >> 3
Divide by 13: (((uint32_t)A * (uint32_t)0x9D8A) >> 16) >> 3
Divide by 14: ((((uint32_t)A * (uint32_t)0x2493) >> 16) + A) >> 1) >> 3
Divide by 15: (((uint32_t)A * (uint32_t)0x8889) >> 16) >> 3
Divide by 30: (((uint32_t)A * (uint32_t)0x8889) >> 16) >> 4
Divide by 60: (((uint32_t)A * (uint32_t)0x8889) >> 16) >> 5
Divide by 100: (((((uint32_t)A * (uint32_t)0x47AF) >> 16U) + A) >> 1) >> 6
Divide by PI: ((((uint32_t)A * (uint32_t)0x45F3) >> 16) + A) >> 1) >> 1
Divide by √2: (((uint32_t)A * (uint32_t)0xB505) >> 16) >> 0

Hopefully you have spotted the relationship between divisors that are multiples of two. For example compare the expressions for divide by 15, 30 & 60.

If someone has too much time on their hands and would care to write a program to compute the values for all integer divisors, then I'd be happy to post the results for everyone to use.

Update


Alan Bowens has risen to the challenge and has generated some nifty programs for generating coefficients for arbitrary 8 and 16 bit values. He's also generated header files for all 8 and 16 bit integer divisors that you can just include and use. You'll find it all at his blog. Nice work Alan.

Bookmark and Share

Thursday, May 28, 2009

Efficient C Tips #9 - Use lookup tables

This the ninth in a series of tips on how to make your C code more efficient.

Typically the fastest ways to compute something on a microcontroller is to not compute it all - but to simply read the result from a lookup table. For example this is regularly done as part of CRC calculations. Despite this I've noticed over the years what I'll call the 'look up tables are boring' syndrome. What do I mean by this? Well when having to code a solution to a problem, it seems that most of us would rather code something that involves crunching numbers, rather than generate a table where we just look up the result. I'm sure that many of you are thinking that I'm dead wrong and that you use lookup tables all the time. Well I'm sure many of you do. However the question is whether you make full use of this capability?

What started me thinking about this is the person who ended up on this blog looking for an efficient algorithm for determining the day of the year. I have no idea if they were coding for an embedded system or not, nor whether they were looking for a fast solution, a minimal memory solution, or something in between. However, it did make me realize that it would a simple albeit slightly contrived way of demonstrating my point about look up tables.

First off, I imposed some constraints

  • The Gregorian calendar is to be used.
  • Days, months and years are numbered from 1 and not zero, such that January 1 is day 1 and not day 0.

Here's my first solution, that makes use of a small lookup table:

#define JAN_DAYS (31)
#define FEB_DAYS (28)
#define LY_FEB_DAYS (29)
...
#define NOV_DAYS (30)

#define MONTHS_IN_A_YEAR (12+1)

uint16_t day_of_year(uint8_t day, uint8_t month, uint16_t year)
{
static const uint16_t days[2][MONTHS_IN_A_YEAR] =
{
{
/* Non leap year table */
0, /* Padding because first month is not zero */
0, /* If month is january, then no days before it */
JAN_DAYS,
JAN_DAYS + FEB_DAYS,
...
JAN_DAYS + FEB_DAYS + ... + NOV_DAYS
},
{
/* Leap year lookup table */
0, /* Padding because first month is not zero */
0, /* If month is january, then no days before it */
JAN_DAYS,
JAN_DAYS + LY_FEB_DAYS,
...
JAN_DAYS + LY_FEB_DAYS + ... + NOV_DAYS
}
};

uint16_t day_of_year;

if ((year % 4 == 0) && (year % 100 != 0) || (year % 400 == 0))
{
/* Leap year */
day_of_year = days[1][month] + day;
}
else
{
/* Non leap year */
day_of_year = days[0][month] + day;
}
return day_of_year;
}

For most applications I think this is an optimal solution in that it handles a very wide range of dates, uses a small amount of storage for the lookup tables and requires minimal computational effort to achieve the result. (On an ARM7 it requires 128 bytes of code space, 64 bytes for the lookup table and executes in about 40 cycles).

However, what about if the code had to run as fast as possible? I'd guess that most folks would work on optimizing the details of the implementation and leave it at that. I'm not sure that many people would consider a gigantic look up table so that the code looks like this:

#define LAST_YEAR (2400 + 1)/* Last year to worry about */
uint16_t day_of_year(uint8_t day, uint8_t month, uint16_t year)
{
static const uint16_t days[LAST_YEAR][[MONTHS_IN_A_YEAR] =
{
{ /* Padding because first year is not zero */
0, /* Padding because first month is zero */
0, /* If month is january, then no days before it */
JAN_DAYS,
JAN_DAYS + FEB_DAYS,
...
JAN_DAYS + FEB_DAYS + ... + NOV_DAYS
},

{ /* Year 1 - non leap year */
0, /* Padding because first month is zero */
0, /* If month is january, then no days before it */
JAN_DAYS,
JAN_DAYS + FEB_DAYS,
...
JAN_DAYS + FEB_DAYS + ... + NOV_DAYS
},

...

{ /* Last year - a leap year */
0, /* Padding because first month is zero */
0, /* If month is january, then no days before it */
JAN_DAYS,
JAN_DAYS + LY_FEB_DAYS,
...
JAN_DAYS + LY_FEB_DAYS + ... + NOV_DAYS
},
};

return days[year][month] + day;
}

The lookup table would of course require at least 2401 * 13 * 2 = 62426 bytes. Evidently this would likely be unreasonable on an 8 bit processor. On a 32 bit processor with 8 Mbytes of Flash - not so unreasonable.

I first learned this lesson many years ago in an application that required an 8051 processor to perform a complicated refresh of multiplexed LEDs at about 1 kHz (a significant load for the 8051). The initial implementation consisted of pure code. Over the next year or so the two of us working on it realized that we could speed it up by using lookup tables. We started off with a small look up table, and by the time we were done, the table was 48K (out of the 64K available to the 8051) while the execution time was a fraction of what it had been before.

Thus next time you are faced with making something run faster consider using a look up table - even if it is huge. Sometimes it's just the best way to go.

Previous Tip
Home

Bookmark and Share

Friday, May 15, 2009

Checking the fuse bits in an Atmel AVR at run time

In general I try and post on topics that have broad appeal in the embedded world. Today I'm going to partially break with that tradition to show how to check the fuse bits in an Atmel AVR class processor. However, before I do so, I'd like to discuss my motivations for wanting to do this.

The AVR processor family, together with the PIC and other processor families contain fuse / configuration bits. These bits are settable only at program time and are used to configure the behavior of the processor at run time. Typical parameters that are configured are oscillator types, brown out voltage detect levels and memory partitioning. Now as I lamented in this post, there is no great way of communicating to the production staff how you want these fuse bits programmed. As a result I consider there to be a very high probability that a mistake will be made in production - and that all my efforts on crafting perfect code will thus be for naught. Thus while it is much better to prevent mistakes, if you can't do so, then the next best thing to do is to detect them. As a result on one of the products that I am working on, I have as one of the startup tests a check to ensure that the fuse bits are indeed what they are supposed to be. While I recognize that if the fuse settings are dreadfully wrong it is unlikley that my code will run, I'm actually more concerned with the case where the fuse bits are set mostly correct - and thus that the code works most of the time.

So how do I do this on an AVR? Well if you are using an IAR compiler the work is mostly done for you. Here it is:

#include <intrinsics.h>

/* Macros to read the various fuse bytes */
#define _SPM_GET_LOW_FUSEBITS() __AddrToZByteToSPMCR_LPM((void __flash*)0x0000U, 0x09U)
#define _SPM_GET_HIGH_FUSEBITS() __AddrToZByteToSPMCR_LPM((void __flash*)0x0003U, 0x09U)
#define _SPM_GET_EXTENDED_FUSEBITS() __AddrToZByteToSPMCR_LPM((void __flash*)0x0002U, 0x09U)

/* Structure to store the fuse bytes */
typedef struct
{
uint8_t fuse_low; /* The low fuse setting */
uint8_t fuse_high; /* The high fuse setting */
uint8_t fuse_extended; /* The extended fuse setting */
uint8_t lockbits; /* The lockbits */
} FUSE_SETTINGS;

/* Storage for the fuse settings will be in EEPROM */
static __eeprom __no_init FUSE_SETTINGS Fuse_Settings @ FUSE_VALUES;

void fuses_Read(void)
{
FUSE_SETTINGS value;

value.fuse_low = _SPM_GET_LOW_FUSEBITS();
value.fuse_high = _SPM_GET_HIGH_FUSEBITS();
value.fuse_extended = _SPM_GET_EXTENDED_FUSEBITS();
value.lockbits = _SPM_GET_LOCKBITS();
__no_operation();

Fuse_Settings = value;
}

The macro __AddrToZByteToSPMCR_LPM() is defined in intrinsics.h. Essentially it takes care of all the necessary finicky register usage required to read the fuse bits. You'll also notice that I have used a macro _SPM_GET_LOCKBITS() to read the lockbits. This macro is also found in intrinsics.h. The really observant reader may wonder why there isn't a macro in intrinsics.h for reading the fuse bits? Well there is - it's just for reading the low fuse byte - which is all the early AVR processors had. I've pointed this out to IAR and they have promised to address this in the next release (thanks Steve!).

Before I leave this topic, I'll also point out that I don't read the fuse settings directly into EEPROM. Instead I read them into RAM and then copy the entire structure to EEPROM. I do this because writing to EEPROM messes with the same registers used for reading the fuse bits - and thus bad things happen. This also explains the __no_operation() statement before the data are copied to EEPROM.

Incidentally, I don't know of a way to read the configuration bits of a PIC at run time. Chalk this up as one more reason why an AVR is superior to a PIC!

Home

Bookmark and Share

Saturday, May 09, 2009

Signed versus unsigned integers

Jack Ganssle's latest newsletter arrived the other day. Within it is an extensive set of comments from John Carter, in which he talks about and quotes from a book by Derek Jones (no relation of mine). The topic is unsigned versus signed integers. I have to say I found it fascinating in the same way that watching a train wreck is fascinating. Here's the entire extract - I apologize for its length - but you really have to read it all to understand my horror.

"Suppose you have a "Real World (TM)" always and forever positive value. Should you represent it as unsigned?

"Well, that's actually a bit of a step that we tend to gloss over...

"As Jones points out in section 6.2.5 the real differences as far as C is concerned between unsigned and signed are...

" * unsigned has a larger range.

" * unsigned does modulo arithmetic on overflow (which is hardly ever what you intend)

" * mixing signed and unsigned operands in an expression involves arithmetic conversions you probably don't quite understand.

"For example I have a bit of code that generates code ... and uses __LINE__ to tweak things so compiler error messages refer to the file and line of the source code, not the generated code.

"Thus I must do integer arithmetic with __LINE__ include subtraction of offsets and multiplication.

"* I do not care if my intermediate values go negative.

"* It's hard to debug (and frightening) if they suddenly go huge.

"* the constraint is the final values must be positive.

"Either I must be _very_ careful to code and test for underflows _before_ each operation to ensure intermediate results do not underflow. Or I can say tough, convert to 32bit signed int's and it all just works. I.e. Line numbers are constrained to be positive, but that has nothing to do representation. Use the most convenient representation.

"C's "unsigned" representation is useless as a "constrain this value to be positive" tool. E.g. A device that can only go faster or slower, never backwards:

unsigned int speed; // Must be positive.
unsigned int brake(void)
{
--speed;
}

"Was using "unsigned" above any help to creating robust error free code? NO! "speed" may now _always_ be positive... but not necessarily meaningful!

"The main decider in using "unsigned" is storage. Am I going to double my storage requirements by using int16_t's or pack them all in an array of uint8_t's?

"My recommendation is this...

" * For scalars use a large enough signed value. eg. int_fast32_t
" * Treat "unsigned" purely as a storage optimization.
" * Use typedef's (and splint (or C++)) for type safety and accessor functions to ensure constraints like strictly positive. E.g.

typedef int_fast32_t velocity; // Can be negative
typedef int_fast32_t speed; // Must be positive.
typedef uint8_t dopplerSpeedImage_t[MAX_X][MAX_Y]; // Storage optimization


I read this, and quite frankly my jaw dropped. Now the statements made by Carter / Jones concerning differences between signed and unsigned are correct - but to call them the real differences is completely wrong. To make my point, I'll first of all address his specific points - and then I'll show you where the real differences are:

Unsigned has a larger range


Yes it does. However, if this is the reason you are using an unsigned type you've probably got other problems.

Unsigned does modulo arithmetic on overflow (which is hardly ever what you intend)


Yes it does, and au contraire - this is frequently what I want (see for example this). However, far more importantly is the question - what does a signed integer do on overflow? The answer is that it is undefined. That is if you overflow a signed integer, the generated code is at liberty to do anything - including deleting your program or starting world war 3. I found this out the hard way many years ago. I had some PC code written for Microsoft's Version 7 compiler. The code was inadvertently relying upon signed integer overflow to work a certain way. I then moved the code to Watcom's compiler (Version 10 I think) and the code failed. I was really ticked at Watcom until I
realized what I had done and that Watcom was perfectly within their rights to do what they did.

Note that this was not a case of porting code to a different target. This was the same target - just a different compiler.

Now let's address his comment about modulo arithmetic. Consider the following code fragment:

uint16_t a,b,c, res;

a = 0xFFFF; //Max value for a uint16_t
b = 1;
c = 2;

res = a;
res += b; //Overflow
res -= c;

Does res end up with the expected value of 0xFFFE? Yes it does - courtesy of the modulo arithmetic. Furthermore it will do so on every conforming compiler.

Now if we repeat the exercise using signed data types.

int16_t a,b,c, res;

a = 32767; //Max value for a int16_t
b = 1;
c = 2;

res = a;
res += b; //Overflow - WW3 starts
res -= c;

What happens now? Who knows? On your system you may or may not get the answer you expect.

Mixing signed and unsigned operands in an expression involves arithmetic conversions you probably don't quite understand


Well whether I understand them or not is really between me and Lint. However, the key thing to know is that if you use signed integers by default, then it is really hard to avoid combining signed and unsigned operands. How is this you ask? Well consider the following partial list of standard 'functions' that return an unsigned integral type:

  • sizeof()
  • offsetof()
  • strcspn()
  • strlen()
  • strpsn()

In addition memcpy(), memset(), strncpy() and others also use unsigned integral types in their parameter lists. Furthermore in embedded systems, most compiler vendors typedef IO registers as unsigned integral types. Thus any expression involving a register also includes unsigned quantities.

Thus if you use any of these in your code, then you run a very real risk of running into signed / unsigned arithmetic conversions. Thus IMHO the usual arithmetic conversions issue is actually an argument for avoiding signed types - not the other way around!

So what are the real reasons to use unsigned data types? I think these reasons are high on my list:

  • Modulus operator
  • Shifting
  • Masking

Modulus Operator


One of the relatively unknown but nasty corners of the C language concerns the modulus operator. In a nutshell, using the modulus operator on signed integers when one or both of the operands is negative produces an implementation defined result. Here's a great example in which they purport to show how to use the modulus operator to determine if a number is odd or even. The code is reproduced below:

int main(void)
{
int i;

printf("Enter a number: ");
scanf("%d", &i);

if( ( i % 2 ) == 0)
printf("Even");
if( ( i % 2 ) ==1)
printf("Odd");

return 0;
}

When I run it on one of my compilers, and enter -1 as the argument, nothing gets printed, because on my system -1 % 2 = -1. The bottom line - using the modulus operator with signed integral types is a disaster waiting to happen.

Shifting


Performing a shift right on a signed integer is implementation dependent. What this means is that when you shift right you have no idea whether the sign bit is preserved or if it is propagated. The implications of this are quite profound. For example, if foo is an unsigned integral type, then a shift right is equivalent to a divide by 2. However, if foo is a signed type, then a shit right is most certainly not the same as a divide by 2 - and will generate different code. It's for this reason that Lint, MISRA and most good coding standards will reject any attempt to right shift a signed integral type. BTW while left shifts on signed types are safer, I really don't recommend them either.

Masking


A similar class of problems occur if you attempt to perform masking operations on a signed data type.

Finally...


Before I leave this post, I just have to comment on this quote from Carter
"Either I must be _very_ careful to code and test for underflows _before_ each operation to ensure intermediate results do not underflow. Or I can say tough, convert to 32bit signed int's and it all just works".

Does anyone else find this scary? He seems to be advocating that rather than think about the problem at hand, he'd rather switch to a large signed data type - and trust that everything works out OK. He obviously thinks he's on safe ground. However consider the case where he has a 50,000 line file (actually 46342 to be exact). Is this an unreasonably large file - well yes for a human generated file. However for a machine generated file (e.g. an embedded image file), it is not unreasonable at all. Furthermore let's assume that his computations involve for some reason a squaring of the number of lines in the file: i.e. we get something like this:

int32_t lines, result;

lines = 46342;
result = lines * lines + some_other_expression;

Well 46342 * 46342 overflows a signed 32 bit type - and the result is undefined. The bottom line - using a larger signed data type to avoid thinking about the problem is not recommended. At least if you use an unsigned type you are guaranteed a consistent answer.
Home

Bookmark and Share

Saturday, May 02, 2009

Doxygen

Todays post was inspired by a new version notice from Dimitri van Heesch concerning his great documentation generator tool doxygen. If you aren't aware of doxygen, then I strongly recommend reading about it and then using it.

So what is Doxygen exactly? Well it has a lot of capabilities, but in a nutshell it can parse your code (C, C++, Java and a host of others not usually used in the embedded space) and from it generate a very nice hyper-linked documentation set. It does this in part by looking for what I'll call control directives embedded in comments. Now what I particularly like about Doxygen is that it allows you to trade off between adding control directives while still making your comments readable. For example, at one extreme you can do nothing special to your code and still end up with a reasonable documentation set. On the other extreme, you can embed so many control directives into your comments that the only sane way to read the comments is via Doxygen; however the documentation will be truly impressive! In my case, I find control directives to be very distracting, and so I opt to use a minimal set that doesn't offend my sensibilities but still gives me very useful results.

So why do I do this? Well while this documentation set is very nice in its own right, I actually find it very useful in improving my code. As remarkable a claim as this is, it's easily substantiated. Here are a few examples:
Call Trees

One of the very nice add ons to Doxygen is graphviz. Using graphviz, Doxygen will generate call trees for all of your functions. I often find this very illuminating - both at a macro level and also a micro level. At the macro level, if I see a call tree that looks like your average two years old's art work, then it's a clear indication of muddled thinking - and impending doom. At the micro level it allows you to spot some errors. For example consider this code fragment, that is intended to update a parameter in an EEPROM data structure, together with its backup copy:

void params_NosChargesSet(uint16_t nos_charges)
{
Factory_Params1.n_charges = nos_charges;
update_factory1_crc();
Factory_Params2.n_charges = nos_charges;
update_factory1_crc();
}

I found the bug in this code not by testing it, but by simply browsing the Doxygen documentation and noticing that the call tree for this function was incorrect. What I liked about this is that this kind of bug is very difficult to detect through testing, and will not be noticed by static analysis. It was however clear as day by looking at its call tree.
Missing documentation

Sometimes when I'm anxious to solve 'the real problem', I find that I'm not as diligent as I should be about describing the use of manifest constants, variables etc. As a result I'll sometimes end up with code that looks like this:

#define SHORT_TERM_BUF_SIZE (8U) /**< meaningful comment */
#define LONG_TERM_BUF_SIZE (32U)

You'll notice that LONG_TERM_BUF_SIZE has no comment associated with it. However, it's "obvious" what its use is because of the comment associated with SHORT_TERM_BUF_SIZE that immediately precedes it. Well when you generate the Doxygen documentation, and you click on the hyperlink associated with LONG_TERM_BUF_SIZE, guess what - no description. While some may think that this is a weakness in Doxygen, I actually think it's a major strength. Here's why:

  • My coding standard requires me to provide a comment for all manifest constants. Thus it is reminding me of the error of my ways.
  • Someone new coming to the code will typically be overwhelmed by what they are faced with. Having an 'implicit comment' is just one more hurdle for them to overcome. Thus Doxygen is accurately reflecting what someone will see when they read your code.


Is Doxygen perfect? No it's not. It often hangs when I run it. However to be fair, that's usually because I haven't played by the rules. Despite this I find it a useful tool in my arsenal. I recommend you take a look at it.
Home

Bookmark and Share

Saturday, April 25, 2009

PIC stack overflow

For regular readers of this blog I apologize for turning once again to the topic of my Nom de Guerre. If you really don't want to read about stack overflow again, then just skip to the second section of this posting where I address the far more interesting topic of why anyone uses an 8-bit PIC in the first place.

Anyway, the motivation for this post is that the most common search term that drives folks to this blog is 'PIC stack overflow'. While I've expounded on the topic of stacks in general here and here, I've never explicitly addressed the problem with 8 bit PICs. So to make my PIC visitors happy, I thought I'll give them all they need to know to solve the problem of stack overflow on their 8 bit PIC processors.

The key thing to understand about the 8 bit PIC architecture is that the stack size is fixed. It varies from a depth of 2 for the really low end devices to 31 for the high end 8 bit devices. The most popular parts (such as the 16F877) have a stack size of 8. Every (r)call consumes a level, as does the interrupt handler. To add insult to injury, if you use the In Circuit Debugger (ICD) rather than a full blown ICE, then support for the ICD also consumes a level. So if you are using a 16 series part (for example) with an ICD and interrupts, then you have at most 6 levels available to you. What does this mean? Well if you are programming in assembly language (which when you get down to it was always the intention of the PIC designers) it means that you can nest function calls no more than six deep. If you are programming in C then depending on your compiler you may not even be able to nest functions this deep, particularly if you are using size optimization.

So on the assumption that you are overflowing the call stack, what can you do? Here's a checklist:

  • Switch from the ICD to an ICE. It's only a few thousand dollars difference...
  • If you don't really need interrupt support, then eliminate it.
  • If you need interrupt support then don't make any function calls from within the ISR (as this subtracts from your available levels).
  • Inline low level functions
  • Use speed optimization (which effectively inlines functions)
  • Examine your call tree and determine where the greatest call depth occurs. At this point either restructure the code to reduce the call depth, or disable interrupts during the deepest point.
  • Structure your code such that calls can be replaced with jumps. You do this by only making calls at the very end of the function, so that the compiler can simply jump to the new function. (Yes this is a really ugly technique).
  • Buy a much better compiler.

If you are still stuck after trying all these, then you really are in a pickle. You could seek paid expert help (e.g. from me or some of the other folks that blog here at embeddedgurus) or you could change CPU architectures. Which leads me to:

So why are you using a PIC anyway?


The popularity of 8 bit PICs baffles me. It's architecture is awful - the limited call stack is just the first dreadful thing. Throw in the need for paging and banking together with the single interrupt vector and you have a nightmare of a programming model. It would be one thing if this was the norm for 8 bit devices - but it isn't. The AVR architecture blows the PIC away, while the HC05 / HC08 are also streets ahead of the PIC. Given the choice I think I'd even take an 8051 over the PIC. I don't see any cost advantages, packaging advantages (Atmel has just released a SOT23-6 AVR which is essentially instruction set compatible with their largest devices) or peripheral set advantages. In short, I don't get it!

Incidentally, this isn't an indictment of Microchip - they are a great company and I really like a lot of their other products, their web site, tech support and so on (perhaps this is why the PIC is so widely used?).

So to the (ir)regular readers of this blog - if you are you using 8 bit PICs perhaps you could use the comment section to explain why. Let the debate begin!
Home

Bookmark and Share

Sunday, April 19, 2009

Unused interrupt vectors

With the exception of low end PIC microcontrollers, most microcontrollers have anywhere from quite a few to an enormous number of interrupt vectors. It's a rare application that uses every single interrupt vector, and so the question arises as to what, if anything, should one do with unused interrupt vectors? I have seen two approaches used - neither of which is particularly good.

Do nothing


I would say this is the most common approach. My guess is that when this approach is used, it's not via conscious choice, but rather the result of inaction. So what's the implication of this approach? Well if an interrupt occurs for which you have not installed an interrupt handler, then the microcontroller will vector to the appropriate address and start executing whatever code happens to be there. It's fair to say that this will ultimately cause a system crash - the only question is how much damage will be done in the process? Having said that, I don't necessarily consider that this approach is always awful. For example a reasonable argument might go something like this.
I know via design, code inspection, static analysis and testing that the probability of a coding error enabling the wrong interrupt is remote. Thus if it does happen it's probably either via severe RF interference, or because the code has crashed. In either case the system has bigger problems than vectoring to an unsupported interrupt.

Of course anybody that's put this much thought into it, will probably be conscientious enough to do something different.

Another valid argument on very memory constrained processors is that you need the unused interrupt vector space for the application. Indeed I have coded 8051 applications where this has been the case. Such is the price we sometimes have to pay on very small systems.

Install 'RETI' instructions at all unused vectors


In this approach, you arrange for there to be a 'Return From Interrupt' instruction at every unused interrupt vector. Indeed this approach is common enough that some compiler manufacturers offer it as a linker option. The concept with this approach is that if an unexpected interrupt occurs, then by executing a RETI instruction, the application will simply continue with very little harm done. All in all this isn't a bad approach. However it has several weaknesses.

  • The biggest problem with this approach is that it doesn't solve the problem of an interrupt source that keeps on interrupting. The most egregious example of this is a level triggered interrupt on a port pin. In this case, depending upon the CPU architecture, it is quite possible for the system to go into a mode whereby it essentially spends all its time vectoring to the interrupt and then returning. However this is by no means the only example. Others that spring to mind are 'Transmit buffer empty' interrupts, and timer overflow type interrupts. In the latter case, the system probably wouldn't spend all of its time interrupting; however a certain fraction of the CPU bandwidth would be wasted, which in a battery powered application for instance, would be a big deal.
  • If you do this at the start of a project, you lose the opportunity to discover errors in which an interrupt source has been erroneously enabled. In short this approach can mask problems, while what is really needed is an approach that can reveal problems.

Recommended Approach


What I do is the following.

  1. At the start of a project I create a file called vector.c In vector.c I create an interrupt handler for every possible interrupt vector. Not only is this an essential first step in solving the problem, I also find it very illuminating as it forces me to read about and understand all the CPU's interrupt sources. This is always a useful step, as in many ways the interrupt sources for a CPU tell you a lot about its capabilities and the designers intent.
  2. Within each interrupt handler, I explicitly mask the interrupt source. This will prevent the interrupt from reoccurring in all but the most extreme of cases.
  3. If necessary, I also clear the interrupt flag. (In some CPU architectures this occurs automatically by vectoring to the interrupt. In others you have to do it manually).
  4. After masking the interrupt source, I then make a call to my trap function. What this means is that while I'm debugging the code, if any unexpected interrupt occurs, then I'll know about it in a hurry. Conversely, of course, with a release build, the trap function compiles down to nothing, essentially removing it from the code.

Here's a code fragment that shows what I mean. In this case it's for an AVR processor and the IAR compiler. However it should be trivial to port this to other architectures / compilers. Note that for the AVR it is in general not necessary to clear the interrupt flag as it is cleared automatically upon vectoring to the ISR.

#pragma vector=INT1_vect /* External Interrupt Request 1 */
__interrupt void int1_isr(void)
{
EIMSK_INT1 = 0; /* Disable the interrupt */
/* Interrupt flag is cleared automatically */
trap();
}
#pragma vector=PCINT0_vect /* Pin Change Interrupt Request 0 */
__interrupt void pcint0_isr(void)
{
PCICR_PCIE0 = 0; /* Disable the interrupt */
/* Interrupt flag is cleared automatically */
trap();
}
...
#ifndef NDEBUG
/** Flag to allow us to exit the trap and see who caused the interrupt */
static volatile bool Exit_Trap = false;
#endif
static inline void trap(void)
{
#ifndef NDEBUG
while (!Exit_Trap)
{
}
#endif
}

Home

Bookmark and Share

Tuesday, April 14, 2009

Effective C Tip #3 - Exiting an intentional trap

This is the third in a series of tips on writing what I call effective C. Today I'd like to give you a useful hint concerning traps. What exactly do I mean by a trap? Well while C++ has a 'built in' exception handler (try searching for 'catch' or 'throw'), C does not (thanks to Uhmmmm for pointing this out). Instead, what I like to do when debugging code is to simply spin in an infinite loop when something unexpected happens. For example consider this code fragment:

switch (foo)
{
case 0:
...
break;

case 1:
...
break;

...

default:
trap();
break;
}

My expectation is that the default case should never be taken. If it is, then I simply call the routine trap(). So what does trap() look like? Well the naive implementation looks something like this:

void trap(void)
{
for(;;)
{
}
}

The idea is that when the system stops responding, stopping the debugger will show that something unexpected happened. However, while this mostly works, it has a number of significant shortcomings. The most important is that leaving code like this in a production release is definitely not a good idea, and so the first modification that needs to be made is to arrange to remove the infinite loop for a release build. This is usually done by defining NDEBUG. The code thus becomes:

void trap(void)
{
#ifndef NDEBUG
for(;;)
{
}
#endif
}

The next problem with this trap function is that it would be ineffective in a system that executes most of its code under interrupt. As a result, it makes sense to disable interrupts when entering the trap. This is of course compiler / platform specific. However it will typically look something like this:

void trap(void)
{
#ifndef NDEBUG
__disable_interrupts();
for(;;)
{
}
#endif
}

The final major problem with this code is that it's hard to tell what caused the trap. While you can of course examine the call stack and work backwards, it's far easier if you instead do something like this:

static volatile bool Exit_Trap = false;

void trap(void)
{
#ifndef NDEBUG
__disable_interrupts();
while (!Exit_Trap)
{
}
#endif
}

What I've done is declare a volatile variable called Exit_Trap and have initialized it to false. Thus when the trap occurs, the code spins in an infinite loop. However by setting Exit_Trap to true, I will cause the loop to be exited and I can then step the debugger and find out where the problem occurred.

Regular readers will perhaps have noticed that this isn't the first time I've used volatile to achieve a useful result.

Incidentally I'm sure that many of you trap errors via the use of the assert macro. I do too - and I plan to write about how I do this at some point.

So does this meet the criteria for an effective C tip? I think so. It's a very effective aid in debugging embedded systems. It's highly portable and it's easy to understand. That's not a bad combination!

Previous Effective C Tip
Home

Bookmark and Share

Saturday, April 11, 2009

On the use of correct grammer in code comments

Back when I was in college the engineering students were fond of dismissing the liberal arts majors by doing such witty things as writing next to the toilet paper dispenser "Liberal Arts degree - please take one". One of the better retaliatory pieces of graffito that I really liked was: "Four years ago I couldn't spell Engineer - now I are one". I think this appealed to me because there was more than a smidgen of truth in its sentiment. If you don't believe me, just take a look at the comments found in most computer programs. I don't think I'm being exactly controversial by noting that most comments :

  • Lack basic punctuation.
  • Contain numerous spelling errors.
  • Liberally use non-standard abbreviations.
  • Regularly omit verbs and / or other basic components of a sentence.

As a result, many comments are nonsensical. In fact I've been in situations where the comments are so badly written that it's easier to read the code than it is the comments. Clearly this isn't a good thing! When I question programmers about this, I typically get a shrug of the shoulders and a 'what's the big deal' attitude. When pressed further, the honest ones will admit that they couldn't be bothered to use correct grammar or spelling because it's too much effort - and after all you can work out what they mean if you just try hard enough. They are of course correct. However taken to its logical conclusion, this is really an argument for not commenting at all - since with a bit (OK a lot) of effort it should be crystal clear what the code is doing (and why) simply by examining it.

I decided to write about this now since I recently heard from Brad Mosch concerning a pet peeve of his. He gave his permission for me to quote from his email:

I see all the time mixed occurrences of whether or not a space is used for things such as 3dB, 3 dB, 1MHz, 1 MHz, etc. I am hoping that someone in the embedded guru world propose that a space is ALWAYS used between the number and the unit of measure. That is the documented standard that was used in our technical writing at United Space Alliance and NASA. The funny thing is, even though that standard existed out there at Kennedy Space Center, not a whole lot of people knew about it because I saw the same problem in documents out there all the time. Anyway, my point is, isn't "1 Hz" a lot more readable than "1Hz"?

I'm sure many of you may think that Brad is being overly picky. However I don't. His real point (in the last sentence) concerns readability. If you are going to write a comment surely it should be as readable as possible? Now I consider myself a very conscientious commenter of code, and so as a matter of interest I did a quick search on my current project to see whether I was following Brad's advice. Well I was - about 90 % of the time. I found that my style depended a bit on the units. For example, I always appended the % sign without a space, whereas mV, mA etc just about always had a space between the value and the units. You'll be pleased to know Brad that I'll be mending my ways!

Anyway, I'll leave this topic for now. Next time I visit it I'll tell you how I spell check my comments. Hopefully having read this post you'll know why I do it.
Home

Bookmark and Share

Friday, April 03, 2009

Commuting is crazy!

A few posts back I suggested that (American) employers would benefit from giving their engineers a lot more time off. In the comments section, Brad opined that he would very much like to work four 10-hour days. One of the reasons he gave was to avoid the stress and hassle of his daily commute. I agree completely with him. However, I'd like to take this one step further. Why is that (most) employers insist that their staff come to the office each day to work? This always strikes me as ludicrous. Of course there are days where one has to attend meetings, or where you need to use the specialized test equipment that your employer owns. In addition there are many of us who work for employers where secrecy demands that you be at work. However, for the vast majority of engineers there is absolutely no need to be in the office every day. Instead a decent home computer, a broad band connection and a VPN and you are pretty much all set to do exactly what you'd do if you went into the office for the day.

Now notwithstanding that allowing / encouraging / demanding that staff work from home whenever possible has great benefits to the the engineer and the environment, the real key is the boost in productivity that is possible. Any engineer I know will tell you that the best way to get a lot of (hard) work done in a hurry is to shut the door, turn off the telephone and block your email. Maybe it's just me, but that's exactly what can happen when you work from home.

But what about the staff that will go home and slough off for the day? Well I'm sure they exist. I'm also sure that anyone that managed to get through an engineering degree program has enough brains to work out how to goof off at work without being caught if that's their inclination. In short I don't see being at work as evidence that you're actually doing anything useful.

What's maddening about this is when you consider the list of jobs that don't require you to come to the office each day. Examples that spring to mind include sales, truck drivers and home-care health workers. Apparently their employers somehow manage to come up with ways of determining whether they are productive or not.

So what to make of this? I think it's largely inertia. Twenty years ago, the cost of engineering tools was so high that you had to go to work to use them. Today you can set up a well equipped laboratory for $10K. Despite this, the notion of engineers having to go to work persists. If I'm correct, and there aren't any substantive reasons for most of us to go to the office every day, then ultimately logic should overcome the inertia - and working from home several days a week will become the norm. However it won't start changing until more of us start pressuring management to explain why we shouldn't do this.
Home

Bookmark and Share

Monday, March 30, 2009

Efficient C Tips #8 - Use const

One of the easiest ways to make your code more efficient is to use const wherever feasible. Just like declaring local functions as static, this is one of those changes that makes your code more robust, more maintainable and faster - a true win-win situation. So how does this work? Well you get the most benefit when passing pointers as parameters to functions. Here's an example of a function whose job it is to compute the sum of an array of integers. The naive implementation would look something like this:

uint32_t sum(uint16_t *ptr, uint16_t n_elements)
{
uint16_t lpc;
uint32_t sum = 0;

for (lpc = 0; lpc < n_elements; lpc++)
{
sum += *ptr++;
}
return sum;
}

I'll ignore the issues of post increment and counting up (for now). Instead, consider the declaration of ptr. As it stands, the caller of this function has no idea whether sum() will modify the data or not, and hence must assume that it does. This has obvious implications for the compiler when it comes to optimization. To overcome this, it is necessary to declare ptr as pointing to const. The function prototype for sum() now becomes:

uint32_t sum(uint16_t const *ptr, uint16_t n_elements);

You'll notice that I prefer to use what I call Saks notation for where I place the const modifier. The more conventional, albeit less sensible way of writing the declaration is:

uint32_t sum(const uint16_t *ptr, uint16_t n_elements);

Regardless of the style, by doing this you are indicating to the compiler that you will not be modifying the data that ptr points to. As a result, the optimizer can make assumptions that will typically lead to tighter code.

Before I give you the final code, I'd like to make a few other observations.

  • As well as potentially making your code more efficient, use of const also makes your code more readable and maintainable. That is, someone examining your code will know something extra about the function simply by looking at the prototype. Personally I find this very useful.
  • If you examine the C standard library, you'll find very liberal use of the const modifier. You should take this as a strong hint that it's a good idea.
  • PC-Lint will very helpfully tell you if a pointer can be declared as pointing to const. Yet another reason for using Lint!

So what does my sum() function look like? Well, incorporating my previous hints on post increment and counting down, it looks something like this:

uint32_t sum(uint16_t const *ptr, uint16_t n_elements)
{
uint32_t sum = 0;

for (; n_elements != 0; --n_elements)
{
sum += *ptr;
++ptr;
}
return sum;
}

Next Tip
Previous Tip
Home
Editorial Note

I've been following my own advice and have been on a short vacation. As a result I've been tardy in responding to some of your comments. I'll try and rectify this over the next few days.

Bookmark and Share

Sunday, March 22, 2009

Demand more time off!

I've been posting on a lot of technical issues lately and so I thought I'd turn to a less cerebral topic - but one which I feel quite passionate about. First off - some background. I'm British by birth and was raised in Europe (UK & Germany) before moving to the USA in my early twenties. Upon arrival in the USA I was struck by many things; however professionally what amazed me was the number of hours the typical engineer works in the USA compared to their European counterparts. When I left the UK, the standard work week was 37.5 hours and the typical amount of paid time off was 4 weeks for new hires, quickly increasing to 6 weeks or more with length of service. To this was added 8 bank holidays. Perhaps more importantly, employers seemed to think that this was a good thing. For example, my employer at the time had the following policies in effect:

  • Employees were encouraged to work their 37.5 hours in such a way, that the work week ended at lunch time on Friday, effectively ensuring that employees had 2.5 day weekends.
  • Employees were strongly encouraged to take at least 2 weeks off as a block, thus ensuring that they got at least one long break from work every year.

By contrast, when I arrived in the USA, I discovered that the norm was quite different. Indeed the policies I encountered were as follows (and this from the American branch of the same firm as I had worked for in the UK):

  • Work week of 40 hours.
  • Engineers were routinely expected to put in unpaid overtime, with 10 hours being the norm.
  • Annual vacation of two weeks, which only started accruing after 6 months service.
  • Very long serving employees might get 3 weeks vacation a year.
  • Taking more than one week off at a time was actively discouraged.

So what to make of this? If you do the mathematics, a typical engineer in the USA would be working about 50 * 50 = 2500 hours a year (ignoring bank holidays - which are about the same), whereas a typical engineer in the UK would be working 37.5 * 48 = 1800 hours - a 39% difference. Now the question is, did I perceive the engineers in the USA to generate more output? I'd say yes, but only by a few percent, and certainly no where near the 39% more hours that they worked.
I'm sure other people's experience will differ. However it's clear to me why there isn't a big difference in productivity. I solve most of my toughest technical problems when I'm not at work. Indeed, there is nothing like taking a stroll, going for a bike ride, or even sitting down for a beer with friends for clearing the mind and allowing you to literally look at issues from a new perspective. I know this experience isn't unique to me, so why don't employers see the light and realize that everyone benefits from requiring engineers (and other professions - but that's outside my bailiwick) to take more time off?

Maybe it's just me, but a start in changing this situation could be for more engineers to start demanding more time off. Some companies are starting to see the light. For example Netrino offers its employees 5 weeks vacation. Let's make them the norm - not the exception!

As a final note, I know I have regular readers from other parts of the world - South America, Australasia, and the former eastern block. I'd be interested to hear what your working conditions are like.

Bookmark and Share

Sunday, March 15, 2009

Sorting (in) embedded systems

Although countless PhD's have been awarded on sorting algorithms, it's not a topic that seems to come up much in embedded systems (or at least the kind of embedded systems that I work on). Thus it was with some surprise recently that I found myself needing to sort an array of integers. The array wasn't very large (about twenty entries) and I was eager to move on to the real problem at hand and so I just dropped in a call to the standard C routine qsort(). I didn't give it a great deal of thought because I 'knew' that a 'Quick Sort' algorithm is in general fast and well behaved and that with sorting so few entries I wasn't too concerned about it being 'optimal'. Anyway, with the main task at hand solved, on a whim I decided to take another look at qsort(), just to make sure that I wasn't being too cavalier in my approach. Boy did I get a shock! My call to qsort() was increasing my code size by 1500 bytes and it wasn't giving very good sort times either. For those of you programming big systems, this may seem acceptable. In my case, the target processor had 16K of memory and so 1500 bytes was a huge hit.

Surely there had to be a better solution? Well there's always a better solution, but in my case in particular, and for embedded systems in general, what is the optimal sorting algorithm?

Well, after thinking about it for a while, I think the optimal sorting algorithm for embedded systems has these characteristics:

  1. It must sort in place.
  2. The algorithm must not be recursive.
  3. Its best, average and worst case running times should be of similar magnitude.
  4. Its code size should be commensurate with the problem.
  5. Its running time should increase linearly or logarithmically with the number of elements to be sorted.
  6. Its implementation must be 'clean' - i.e. free of breaks and returns in the middle of a loop.

Sort In Place
This is an important criterion not just because it saves memory, but most importantly because it obviates the need for dynamic memory allocation. In general dynamic memory allocation should be avoided in embedded systems because of problems with heap fragmentation and allocation performance. If you aren't aware of this issue, then read this series of articles by Dan Saks on the issue.
Recursion
Recursion is beautiful and solves certain problems amazingly elegantly. However, it's not fast and it can easily lead to problems of stack overflow. As a result, it should never be used in embedded systems.
Running Time Variability
Even the softest of real time systems have some time constraints that need to be met. As a result a function whose execution time varies enormously with the input data can often be problematic. Thus I prefer code whose execution time is nicely bounded.
Code Size
This is often a concern. Suffice to say that the code size should be reasonable for the target system.
Data Size Dependence
Sorting algorithms are usually classified using 'Big O notation' to denote how sensitive they are to the amount of data to be sorted. If N is the number of elements to be sorted, then an algorithm whose running time is N Log N is usually preferred to one whose running time is N2. However, as you shall see, for small N the advantage of the more sophisticated algorithms can be lost by the the overhead of the sophistication.
Clean Implementation
I'm a great proponent of 'clean' code. Thus code where one exits from the middle of a loop isn't as acceptacle as code where everything proceeds in an orderly fashion. Although this is a personal preference of mine, it is also codified in for example the MISRA C requirements, to which many embedded systems are built.

Anyway to determine the optimal sorting algorithm, I went to the Wikipedia page on sorting algorithms and initially selected the following for comparison to the built in qsort: Comb, Gnome, Selection, Insertion, Shell & Heap sorts. All of these are sort in place algorithms. I originally eschewed the Bubble & Cocktail sorts as they really have nothing to commend them. However, several people posted comments asking that I include them - so I did. As predicted they have nothing to commend them. In all cases, I used the Wikipedia code pretty much as is, optimized for maximum speed. (I recognize that the implementations in Wikipedia may not be optimal - but they are the best I have). For each algorithm, I sorted arrays of 8, 32 & 128 signed integers. In every case I sorted the same random array, together with a sorted array and an inverse sorted array.

First the code sizes in bytes:

qsort() 1538
Gnome() 76
Selection() 130
Insertion() 104
Shell() 242
Comb() 190
Heap() 200
Bubble() 104
Cocktail() 140

Clearly, anything is a lot better than the built in qsort(). However, we are not comparing apples and oranges, because qsort() is a general purpose routine, whereas the others are designed explicitly to sort integers. Leaving aside qsort(), the Gnome sort Insertion sort and Bubble sorts are clearly the code size leaders. Having said that, in most embedded systems, a 100 bytes here or there is irrelevant and so we are free to choose based upon other criteria.

Execution times for the 8 element array


Name Random Sorted Inverse Sorted
qsort() 3004 832 2765
Gnome() 1191 220 2047
Selection() 1120 1120 1120
Insertion() 544 287 756
Shell() 1233 1029 1425
Comb() 2460 1975 2480
Heap() 1265 1324 1153
Bubble() 875 208 1032
Cocktail() 1682 927 2056

In this case, the Insertion sort is the clear winner. Not only is it dramatically faster in almost all cases, it also has reasonable variability and it has almost the smallest code size. Notice that the bubble sort for all its vaunted simplicity consumes as much code and runs considerably slower. Notice that the Selection sort's running time is completely consistent - and not too bad when compared to other methods.

Execution times for the 32 element array

Name  Random  Sorted  Inverse Sorted
qsort() 23004 3088 19853
Gnome() 17389 892 35395
Selection() 14392 14392 14392
Insertion() 5588 1179 10324
Shell() 6589 4675 6115
Comb() 10217 8638 10047
Heap() 8449 8607 7413
Bubble() 13664 784 16368
Cocktail() 17657 3807 27634

In this case, the winner isn't so clear cut. Although the insertion sort still performed well, it's showing a very large variation in running time now. By contrast the shell sort has got decent times with small variability. The Gnome, Bubble and Cocktail sorts are showing huge variability in execution times (with a very bad worst case), while the Selection sort shows consistent execution time. On balance, I'd go with the shell sort in most cases.

Execution times for the 128 element array

Name  Random  Sorted  Inverse Sorted
qsort() 120772 28411 77896
Gnome() 316550 3580 577747
Selection() 217420 217420 217420
Insertion() 88475 4731 158020
Shell() 41661 25611 34707
Comb() 50858 43523 48568
Heap() 46959 49215 43314
Bubble() 231294 3088 262032
Cocktail() 271821 15327 422266

In this case the winner is either the shell sort or the heap sort depending on whether you want raw performance more or less when compared to performance variability. The Gnome, Bubble and Cocktail sorts are hopelessly outclassed.

So what to make of all this? Well in any comparison like this there are a myriad of variables that one should take into account, and so I don't believe these data should be treated as gospel. What is clear to me is that:

  1. Being a general purpose routine, qsort() is unlikely to be the optimal solution for an embedded system.
  2. For many embedded applications, a shell sort has a lot to commend it - decent code size, fast running time, well behaved and a clean implementation. Thus if you don't want to bother with this sort of investigation every time you need to sort an array, then a shell sort should be your starting point. It will be for me henceforth.

Home

Bookmark and Share

Thursday, March 05, 2009

Efficient C Tips #7 - Fast loops

Every program at some point requires some set of actions to be taken a fixed number of times. Indeed this is such a common occurrence that we typically code it without really giving it much thought. For example, if I asked you to call a function foo() ten times, I'm sure that most of you would write something like this:

for (uint8_t lpc = 0; lpc < 10; ++lpc)
{
foo();
}

While there is nothing wrong with this, per se, it is sub optimal on just about every processor. Instead you are better off using a construct which counts down to zero. Here are two alternative ways of doing this:

for(uint8_t lpc = 10; lpc != 0; --lpc)
{
foo();
}

uint8_t lpc = 10;
do
{
foo();
} while (--lpc);

Which one you think is more natural is entirely up to you.

So how does this efficiency arise? Well in the count up case, the assembly language generated by the compiler typically looks something like this:

INC lpc ; Increment loop counter
SUB lpc, #10 ; Compare loop counter to 10
BNZ loop ; Branch if loop counter not equal to 10

By contrast, in the count down case the assembly language looks something like this

DEC lpc ; Decrement loop counter
BNZ loop ; Branch if non zero

Evidently, because of the 'specialness' of zero, more efficient code can be generated.

So why don't you see C programs littered with these count down constructs? Well counting down has a major limitation. If you need to use the loop variable as an index into an array then you have a problem. For example, let's say I wanted to zero the elements of an array. Using the count down technique you might be tempted to do this:

uint8_t bar[10];
uint8_t lpc;

do
{
bar[lpc] = 0; // Error! First time through results in index beyond end of array
} while (--lpc);

Evidently it doesn't work. You can of course modify the code to make it work. However doing so typically loses you all the efficiency gains, such that you are better off with a standard up-counting for loop.

As a parting thought, concepts such as these are second nature to assembly language programmers - all of whom do this sort of thing instinctively. As a result, if you are really interested in getting the best out of your C compiler, you could do a lot worse than learning how to program your target processor in assembly language. Does this defeat one of the objectives in programming in a high level language - yes. However, for giving you insight in terms of what is going on under the hood it cannot be beaten.

Next Tip
Previous Tip
Home

Bookmark and Share

Sunday, March 01, 2009

Computing your stack size

Many of the folks that come to this blog by way of search engines do so because they are having problems with stack overflow. I've already given my take on the likely causes of a stack overflow. Today I'd like to offer some hints on a related topic - how to set about computing the stack size for your application. This is an extremely difficult problem, which can be approached in one of three ways - experimentally, analytically or randomly. The latter is by far the most common technique, which consists essentially of choosing a number and seeing whether it works! In an effort to reduce the use of the random approach, I'll try and summarize the other two methods.

Experimentally


In the experimental method, a typically very large stack size is selected. The stack is then filled with an arbitrary bit pattern such as 0xFC. The code is then executed for a 'suitable' amount of time, and then the stack is examined to see how far it has grown. Most people will typically take this number, add a safety margin and call it the required stack size. The main advantage of this approach is that it's easy to do (indeed many good debuggers have this feature built in to them). It also has the advantage of being 'experimental data'. However, there are two big problems with this approach, which will catch the unwary.

The biggest single problem with the experimental approach is the implicit assumption that the experiment that is run is representative of the worst case conditions. What do I mean by the worst case conditions? Well, the maximum stack size occurs in an embedded system when an interrupt that uses the most stack size occurs at a point in the code that the foreground application is also using the maximum stack size. On the assumption that most interrupts are asynchronous to the foreground application, the problem should be clear. How exactly do you know after your testing whether or not the interrupt that uses the most stack size did indeed trigger at the worst (best?) possible moment? Thus even if your testing had 100% code coverage, it still isn't possible to know for sure whether you have covered all possible scenarios. If, as is the normal case, you don't even begin to approach full code coverage, it should be clear to you that testing tends to reveal the typical 'worst-case' condition, rather than the genuine worst case condition.

The second major issue with testing is that it tends to be done when the code is close to being completed, rather than when it is completed. The problem is that small changes in the source code can have a huge impact on the required stack size. For example, let's say that during testing it is discovered that an interrupt service routine is taking so long to complete that another interrupt is being occasionally missed. A 'quick fix' is to simply enable interrupts in the long interrupt handler, so that the other interrupts can do their thing. This one line change can lead to a dramatic increase in stack usage. (If you aren't cognizant of the stack usage of interrupt handlers, you should read this article I wrote).

Analytically


In the analytical approach, the idea is to examine the source code and from the analysis work out the maximum stack usage of the foreground application, and then to add to this the worst case interrupt handler usage. This is obviously a daunting task for anything but the simplest of applications. You will not be surprised to hear that computer programs have been written to perform this analysis. Indeed good quality linkers will now do this for you as a matter of course. Furthermore, my favorite third party tool, PC-Lint from Gimpel, will also now do this starting with version 9. However be warned that it takes a lot of work to set up PC-Lint to perform the analysis.

Although analysis can theoretically give an accurate answer, it does have several problems.

Recursion


It's almost impossible for an analytical approach to compute the stack usage of a program that uses recursion. Indeed it's because of the unbounded effect on stack size that recursion is a really bad idea in embedded systems. Indeed MISRA bans it, and I personally banned it about twenty years ago.

Indirect Function Calls


Pointers to functions are something that I use extensively and heartily recommend (for a discussion see this article I wrote). Although they don't have a deleterious effect on stack size, they do make it quite difficult for analysis programs to track what is going on. Indeed PC-Lint cannot handle pointers to functions when it comes to computing stack usage. Thus if you use an analytical approach and you use pointers to functions, then make absolutely sure that the analysis program can track all the indirect calls.

Optimizers


Code optimization can play havoc with the stack usage. Some optimizations reduce stack usage (by e.g. placing function parameters in registers), while others can increase stack usage. I should note that it's only third party tools that should be bamboozled by the optimizer. The linker that makes up part of the compilation package should be aware of everything that the compiler has done.

Complexity


Even if you have a linker that will compute stack usage, interpreting the output of the linker is always a daunting task. For example, the linker from IAR will compute your stack usage. However, it isn't nice enough to simply say: You need 279 bytes of stack space. Instead you have to study the linker output carefully to glean the requisite information.

A Practical Approach


It's clear from the above that it isn't easy to determine the stack size for an application. So how exactly does one set about this in practice? Well here's what I do.

  1. Locate the stack at the beginning / end of memory (depending upon how the stack grows) and place all variables at the other end of the memory. This essentially means that you are implicitly allowing the maximum amount of memory possible for the stack. Note that many good compilers / linkers will do this automatically for you.
  2. As a starting point, I allocate 10% of the available memory for stack use. If I know I will be using functions that are huge users of the stack (such as printf, scanf and their brethren), then I'll typically set it to 20% of available memory.
  3. I set up the debug environment from day 1 to monitor and report stack usage. This way as I progress through the development process I get a very good feel for the application's stack consumption. This also helps in spotting changes to the code that have big impacts on the required stack size.
  4. Once I have 'all the code written', I start to make use of the information in the linker report. The more tight I am on memory, the closer I examine the linker output. In particular, what I often find is that there is one and only one function call chain that leads to a stack usage that is much greater than all the other call chains. In which case, I look to see if I can restructure that call chain so as to bring the maximum stage usage more in line with the typical stack usage.


If you stumbled upon this blog courtesy of a search engine, then I hope you found the above useful. I invite you to check out some of my other posts, which you may find useful. If you are a regular reader, then as always, thanks for stopping by.

Home

Bookmark and Share

Sunday, February 22, 2009

Effective C Tips #2 - Defining buffer sizes

This is the second in a series of tips on writing what I call effective C. Today I'm addressing something that just about every embedded system has - a buffer whose length is a power of two.

In order to make many buffer operations more efficient, it is common practice to make the buffer size a power of two so that simple masking operations may be performed on them, rather than explicit length checks. This is particularly true of communications buffers where data are received under interrupt. As a result, it is common to see code that looks something like this:

#define RX_BUF_SIZE (32)
static uint8_t Rx_Buf[UART_RX_BUF_SIZE];/* Receive buffer */

__interrupt void RX_interrupt(void)
{
static uint8_t RxHead = 0; /* Offset into Rx_Buf[] where next character should be written */
uint8_t rx_char;

rx_char = HW_REG; /* Get the received character */

RxHead &= RX_BUF_SIZE - 1; /* Mask the offset into the buffer */
Rx_Buf[RxHead] = rx_char; /* Store the received char
++RxHead; /* Increment offset */
}

The first thing I do to make this code more flexible, is to allow the size of the buffer to be overridden on the command line. Thus my declaration for the buffer size now looks like this:

#ifndef RX_BUF_SIZE
#define RX_BUF_SIZE (32)
#endif

This is a useful extension because it allows me to control the resources used by the code without having to edit the code per se. However, this flexibility comes at a cost. What happens if someone was to inadvertently pass a non power of 2 buffer size on the command line? Well as it stands - disaster. However, the fix is quite easy.

#ifndef RX_BUF_SIZE
#define RX_BUF_SIZE (32)
#endif
#define RX_BUF_MASK (RX_BUF_SIZE - 1)
#if ( RX_BUF_SIZE & RX_BUF_MASK )
#error Rx buffer size is not a power of 2
#endif

What I've done is define another manifest constant, RX_BUF_MASK to be equal to one less than the buffer size. I then test using a bit-wise AND of the two manifest constants. If the result is non zero, then evidently the buffer size is not a power of two and compilation is halted by use of the #error statement. If you aren't familiar with the #error statement, you'll find this article I wrote a few years back to be helpful.

Although this is evidently a big improvement, it still isn't quite good enough. To see, why, consider what happens if RX_BUF_SIZE is zero. Zero is of course a power of two, and so will pass the check. Now most C90 compliant compilers will complain about declaring an array with zero length. However this is legal in C99 compilers in general and GNU compilers in particular. Thus, we also need to protect against this case. Furthermore as Yevheniy was kind enough to point out in the comments, we also have to protect against a buffer size of 1 (as 1 & 0 = 0). So we now get:
 
#ifndef RX_BUF_SIZE
#define RX_BUF_SIZE (32)
#endif
#if RX_BUF_SIZE < 2
#error Rx buffer must be a minimum length of 2
#endif
#define RX_BUF_MASK (RX_BUF_SIZE - 1)
#if ( RX_BUF_SIZE & RX_BUF_MASK )
#error Rx buffer size is not a power of 2
#endif

As a final comment, note that the definition of RX_BUF_MASK has an additional benefit in that it can be used in the mask operation in place of (RX_BUF_SIZE - 1), so that my interrupt handler now becomes:

__interrupt void RX_interrupt(void)
{
static uint8_t RxHead = 0; /* Offset into Rx_Buf[] where next character should be written */
uint8_t rx_char;

rx_char = HW_REG; /* Get the received character */

RxHead &= RX_BUF_MASK; /* Mask the offset into the buffer */
Rx_Buf[RxHead] = rx_char; /* Store the received char
++RxHead; /* Increment offset */
}

So is this effective C? I think so. It's efficient, it's flexible and its robustly protected against the sorts of bone headed mistakes that we all make from time to time.

Next Effective C Tip
Previous Effective C Tip
Home

Bookmark and Share

Wednesday, February 18, 2009

Efficient C Tips #6 - Don't use the ternary operator

I have to confess that I like the ternary operator. K&R obviously liked it, as it is heavily featured in their seminal work. However after running experiments on a wide range of compilers I have concluded that with the optimizer turned on, you are better off with a simple if-else statement. Thus next time you write something like this:

y = (a > b) ? c : d;

be aware that as inelegant as it is in comparison, this will usually compile to better code:

if (a > b)
{
y = c;
}
else
{
y = d;
}

I find this frustrating, as I've consumed 8 lines doing what is more easily and elegantly performed in 1 line.

I can't say that I have any particular insight as to why the ternary operator performs so poorly. Perhaps if there is a compiler writer out there, they could throw some light on the matter?

Next Tip
Previous Tip
Home

Bookmark and Share

Sunday, February 15, 2009

Horner's rule addendum

A few weeks ago I wrote about using Horner's rule to evaluate polynomials. Well today I'm following up on this posting because I made a classic mistake when I implemented it. On the premise that one learns more from one's mistakes than one's successes, I thought I'd share it with you.

First, some background. I had some experimental data on the behavior of a sensor against temperature. I needed to be able to fit a regression curve through the data, and so after some experimentation I settled on a quadratic polynomial fit. This is what the data and the curve looked like:



On the face of it, everything looks OK. However, if you look carefully, you will notice two things:

  • The bulk of the experimental data cover the temperature range of 5 - 48 degrees.
  • There is a very slight hook on the right hand side of the graph

So where's the mistake? Well actually I made two mistakes:

  • I assumed that my experimental data covered the entire expected operating temperature range.
  • I failed to check at run time that the temperature was indeed bounded to the experimental input range.

Why is this important? Well, what happened, was that in some circumstances the sensor would experience temperatures somewhat higher than I expected when the experimental data was gathered, e.g. 55 degrees. Well that doesn't sound too bad - until you take the polynomial and extend it out a bit. This is what it looks like:

You can see that at 55 degrees, the polynomial generates a value which is about the same as at 25 degrees. Needless to say, things didn't work too well!

So what advice can I offer?

  • Ensure that when fitting a polynomial to experimental data, that the experimental data covers all the possible range of values that can be physically realized.
  • Always plot the polynomial to see how it performs outside your range of interest. In particular, if it 'takes off' in a strange manner, then treat it very warily.
  • At run time, ensure that the data that you are feeding into the polynomial is bounded to the range over which the polynomial is known to be valid.

The maddening thing about this for me, was that I 'learned' this lesson about polynomial fits many years ago. I just chose to ignore it this time.

Before I leave this topic, I'd like to offer one other insight. If you search for Horner's rule, you'll find a plethora of articles. The more detailed ones will opine on topics such as evaluation stability, numeric overflow issues and so on. However, it's rare that you'll find this sort of information on polynomial evaluation posted. I think it's because we tend to get wrapped up in the details of the algorithm while losing sight of the underlying mathematics of what is going on. The bottom line, the next time you find a neat algorithm posted on the web for 'solving' your problem, take a big step back and think hard about what is really going on and what are the inherent weaknesses in what you are doing.
Home

Bookmark and Share

Tuesday, February 10, 2009

Effective C Tips #1 - Using vsprintf()

I've been running a series on of tips on Efficient C for a while now. I thought I'd broaden the scope by also offering a series of tips on what I call Effective C. These will be tips that while not necessarily allowing you to write tighter code, will allow you to write better code. I'm kicking the series of on the rarely used standard library function, vsprintf(). First, some preamble...

One of the perverse things I tend to do is look through the C standard library and examine functions that on the face of it seem, well, useless. I do this because I think the folks that worked on this stuff were in general very smart and thus had a very good reason for including some of these 'weird' functions. One of these is the function 'vsprintf'. If you go and look up the definition of this function, e.g. here , then you'll find a rather brain ache inducing description. Now back when I was a lad I'd look at descriptions such as this and simply shrug and walk away. However, about ten years ago I started to make a concerted effort to see if a function such as vsprintf has a real benefit in embedded systems. Here's what I discovered in this case:

If you are working on a product that contains a VFD or LCD, then you will almost certainly have code that contains a function for writing a string to the display at a specified position. For example:


static void display_Write(uint8_t row, uint8_t col, char const * buf)
{
/* Send formatted string to display - hardware dependent*/
}

Then you will also have a plethora of functions that essentially do the same thing. That is accept some data, allocate a buffer on the stack, use sprintf to write formatted data into the buffer, and then call the function that actually writes the buffer to the display at the required position. Here's some examples:

void display_Temperature(float ambient_temperature)
{
char buf[10;

sprintf(buf,"%5.2f", ambient_temperature);
display_Write(6, 8, buf);
}

...

void display_Time(int hours, int minutes, int seconds)
{
char buf [12];

sprintf(buf,"%02d:%02d:%02d", hours, minutes, seconds);
display_Write(3, 9, buf);
}

There's nothing really wrong with this approach. However, there is a better way, courtesy of vsprintf().

What one does is to modify display_Write() to take a variable length argument list. Then within display_Write() use vsprintf() to process the variable length argument list and to generate the requisite string. The basic structure for the function is as follows:

void display_Write(uint8_t row, uint8_t column, char const * format, ...)
{
va_list args;
char buf[MAX_STR_LEN];

va_start(args, format);
vsprintf(buf, format, args); //buf contains the formatted string

/* Send formatted string to display - hardware dependent*/

va_end(args); // Clean up. Do NOT omit
}

My objective here is not to explain how to use variadic arguments or indeed how vsprintf() works - there are dozens of places on the web that will do that. Instead I'm interested in showing you the benefit of this approach. The display_Write() function has evidently become more complex; however the functions that call display_Write have become dramatically simplified, as they are now just:

void display_Temperature(float ambient_temperature)
{
display_Write(6, 8, "%5.2f", ambient_temperature);
}

void display_Time(int hours, int minutes, int seconds)
{
display_Write(3, 9, "%02d:%02d:%02d", hours, minutes, seconds);
}

Is this more Effective code? I think so, for the following reasons.

  • The higher level functions are now much cleaner and easier to follow.
  • All the heavy lifting is localized in one place, which typically dramatically reduces the probability of errors.

Finally, you'll typically end up with a nice reduction in code size (even though this wasn't my objective). All in all, not bad for one obscure function.

Next Tip
Home

Bookmark and Share

Friday, February 06, 2009

Electrical Engineers versus Computer Scientists

Looking back at my various blog postings, I've noticed that although I may be controversial on technical topics, I haven't to date written anything that is controversial on a, shall I say, human side. Well no more Mr. Nice Guy, since today I intend to wade in on the topic of whether Embedded Systems should be programmed by Electrical Engineers or Computer Scientists. Regular readers will know I'm an EE (actually my degree is in EE & ME - but that's another story) and so you won't be surprised to hear that my usual preference is for Electrical Engineers. Although I am a (very) opinionated person, I'd like to think that most of my opinions have some basis in reality, and so here's my opinion and its supporting observations...

The more embedded a product is, the better off you are with an EE, the less embedded it is, the better off you are with a CS.

So what's the basis for this overblown, sweeping generalization and what exactly do I mean by 'more embedded'?

Well, I consider a product to be highly embedded if it meets one or more of the following criteria:

  • It has no or very simple user interfaces.
  • It performs a lot of hardware type functions in software. For example a DSP that performs a lot of signal processing is essentially doing in software what was once done in hardware.
  • It contains a lot of complicated hardware that needs extensive configuration and software support (For example a PowerQUICC processor).

By contrast, I consider a product to be lightly embedded if it meets either of the following criteria:

  • It has a sophisticated user interface (especially if the interface is web based)
  • It is database centric.

Evidently there exists products that meet the criteria for both sides of the dichotomy. For example, my new flat screen TV has a very sophisticated user interface, but I'm sure it does an extensive amount of signal processing.

If you accept this dichotomy, then it is evident that folks working on highly embedded systems really need to understand the hardware (since that's what the product is about) whereas those working on lightly embedded systems need a good understanding of how to build large software systems. Having said this, my experience is that whereas EE's (OK some EE's) are able to quickly learn the principles of building large software systems, I've never yet met a CS major that had anything beyond a casual understanding of what's really happening at the hardware level. I've seen this lack of knowledge (interest?) manifest itself in many ways. Examples include:

  • Not knowing / understanding the Nyquist Sampling theorem
  • Failure to realize that EEPROM / Flash have extraordinarily long write times
  • Not realizing that sampling jitter can destroy the performance of a digital filter

What about the other way? Have I seen EE's write 1000 line functions, and be completely clueless about principles such as data encapsulation? Absolutely! However, I have also seen EE's successfully craft very large systems. As a result I've come to two basic observations:

  • A deeply embedded system written entirely by a CS major will have major problems.
  • A lightly embedded system written entirely by an EE major may have major problems.

On this basis, I prefer (slightly) to have EE's work on embedded systems.

It doesn't take a rocket scientist to conclude that perhaps the best approach is to have a team where the EE's handle the hardware centric stuff and the CS's handle the computer centric stuff. Indeed, this is the approach I see taken in most organizations.

As a final thought, although it is common to find EE majors that have gone back to college to get a Masters in Computer Science, I haven't yet met a CS major that has gone back to college to get a Masters in Electrical Engineering.

Bookmark and Share

Monday, February 02, 2009

First do no harm ...

One of the pleasures of working for myself is that it allows me to experiment with some rather non-traditional approaches to the whole concept of 'work'. In fact, looking back at some of my postings, here, here and here it's clear that this is a recurring theme in my writing. I mention this because a number of years ago I instituted the policy of
Two idiotic mistakes and I quit.
What exactly is this you ask? Well over the years I have noticed that I have days in which rather than progressing on problems, I actually regress, often by huge amounts. I do stupid things such as apply power with the wrong polarity to a board, or I design a circuit that will evidently never work. If I make two of these bone headed mistakes in quick succession, I take it as a clear indicator that my head really isn't where it needs to be - and I quit for the day.

Now, back when I was an employee, I simply had no choice other than to continue 'working', even though I knew full well that I'd be doing my employer a favor if I did nothing more than sit in the corner for the rest of the day. Today, I simply walk away and return to the problem the next day.

It would be an unusual manager who recognized that these days occur - and encouraged his staff to 'quit' when they did. I'm sure for many managers, this concept is too radical. However, if Engineers are indeed professionals, then we could do worse than adopt the abbreviated form of the Hippocratic oath given in the title to this posting.

Home

Bookmark and Share

Tuesday, January 20, 2009

Common programming errors and presidential inaugurations

I don't normally link politics and embedded systems, but something happened today at the inauguration of Barack Obama that struck me as an obvious error, but which my family and I suspect 99.999% of the rest of the viewers accepted without question. I'm referring to the third paragraph of Rick Warren's invocation where he stated:
Now, today, we rejoice not only in America’s peaceful transfer of power for the 44th time. We celebrate a ...

Well it seems to me that if Barack Obama is the 44th president of the USA, then there can only have ever been 43 transitions of power. I suppose that one could claim that when Washington became president, it was a transition of power. However no one could possibly claim it was peaceful!

What's my point? Well Rick Warren had just made a classic programming blunder. I'm guessing that his invocation was scrutinized by an army of political hacks, many with advanced degrees from top universities - yet despite this the error was not caught. I guess next time you make this mistake in your code, you can console yourself with this information.

BTW, you will not be surprised to know that my wife and kids just think that this confirms their belief that I'm a complete Nerd who is in desperate need of a life!

Bookmark and Share

Sunday, January 18, 2009

Using Espresso to simplify embedded systems

In this case, Espresso does not refer to the highly caffeinated drink, but rather to the public domain logic minimization tool. What does this have to do with embedded systems? Well, several months back I was faced with an interesting problem. A product I was working on had nine different alarm outputs (some of which are contradictory), which together were dependent upon about thirty different inputs. Furthermore, the interaction between the various inputs leads to situations where the desired alarm outputs are non obvious, and certainly difficult to determine algorithmically. At this point I realized that what was needed was essentially a giant truth table, where the outputs for any given set of inputs was determined by an expert who could look at the various inputs and determine the optimal alarm strategy.

Thus the question was, how to tackle this problem? This is what we ultimately ended up doing.

First of all the truth table was entered in a database. This was done simply so that we could easily run queries, such as "show me all cases where output 3 is asserted when inputs 6 12 and 13 are negated". This essentially then was the environment in which the human expert worked.

Once the expert was happy with the truth table, it was outputted in CSV format. The CSV file was then pre-processed by a Perl script (thanks Don) and fed to the Espresso logic minimization program. The output of Espresso was then post-processed by the Perl script and converted into compilable C code.

To give you a feel for what the output looks like, here's an excerpt (with the comments removed):

if(((!(inputs[0] & 0x20)) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10) && (!(inputs[3] & 0xa0))) ||
((!(inputs[0] & 0x20)) && (!(inputs[1] & 0x60)) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)) ||
((!(inputs[0] & 0x20)) && ((inputs[2] & 0x4) == 0x4) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)) ||
((!(inputs[0] & 0x20)) && ((inputs[1] & 0x1) == 0x1) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)) ||
((!(inputs[0] & 0x20)) && ((inputs[2] & 0x2) == 0x2) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)) ||
((!(inputs[0] & 0x24)) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)) ||
((!(inputs[0] & 0x28)) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)) ||
((!(inputs[0] & 0x30)) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)) ||
((!(inputs[0] & 0x20)) && (!(inputs[1] & 0x4)) && (!(inputs[2] & 0x30)) && ((inputs[3] & 0x10) == 0x10)))
{
out |= 2048;
}

Evidently, it's enough to make your head spin!

For me, the real benefits of such an approach are as follows:

  • I was able to completely divorce the code from the desired functionality. That is, the functionality of the product was completely driven by the client and was in no way dependent upon me doing anything. Thus, when the client asks me 'what does it do when the following occurs", I can honestly answer "it's whatever you told it to do".
  • By setting this up using a database and a Perl script we recognized that changes to the truth table would inevitably occur, and thus made the process as painless as possible. Now, when a change in functionality is desired, the client simply makes the changes in the database, presses a button to output a new file and I then run a 'make'.
  • The approach is rigorous. We have considered every possible combination of inputs - no matter how unlikely they are to occur. In my experience, this is something that firmware is not very good at.


Although I think this is neat in its own right, I think there are several larger points worth making:

  • Just because a tool was designed ostensibly for one environment (in this case Espresso was really designed for logic minimization in electrical circuits), don't be afraid to use it in other ways.
  • Recognize that certain elements in your design are highly prone to change - and design them with this in mind.
  • Either learn a scripting language or have a scripting expert at your disposal to help build your tool sets. (In my case, I do the latter).
  • If you can divorce your code from the required functionality (i.e. data driven coding), then seriously consider it.


An apology


For all of you that subscribe via RSS, I apologize for the recent blitz of data. I decided to go back through all my postings and add links where appropriate, which seems to have forced the posts to be regenerated.

Home

Bookmark and Share

Sunday, January 11, 2009

Using volatile to achieve persistence!

Once in a while the real world and the arcane world of language standards collide, resulting in surprising results. To see what I mean, read on ...

Many of the products I design incorporate a Bootstrap Loader, so that the application firmware may be updated in the field. In most cases, the bootstrap loader is a completely different program to the main application. Despite this, I find it useful for the main application to pass information to the bootstrap loader and vice versa. Thus the question arises, how best to do this? Well in the processor family I am using, although it is technically possible to store information in Flash, EEPROM or RAM, by far the easiest and most secure way of doing it is to place the information into EEPROM. Furthermore, in order to enter the bootstrap loader it is highly desirable to force a reset of the processor by allowing the watchdog timer to time out.

Thus, the code to enter the bootstrap loader looks something like this:

__eeprom uint8_t msg_for_bootloader;

...

msg_for_bootloader = 0x42;

...

for(;;)
{
/* Wait for watchdog to generate a reset and force entry in to the bootstrap loader */
}


Well, on the face of it, there is not much wrong with this code. However, if one turns on the optimizer, then the compiler examines the code, decides that no code may be executed beyond the infinite loop and thus concludes that the write to msg_for_bootloader is pointless, and promptly optimizes it away. (For a discussion on this topic, see my posting here)

Now you will note that msg_for_bootloader was qualified with __eeprom. This is a compiler extension that allows one to inform the compiler that the variable msg_for_bootloader resides in a special memory space and to be treated accordingly. Now I know that the compiler knows enough about the EEPROM space to generate the correct coding sequences such that reads and writes are performed correctly. However, in my naivete, I also assumed that the compiler knew something about the properties of EEPROM, such that it would realize writing to EEPROM without ostensibly reading it again is intrinsically useful in many applications.

Well it does not. Furthermore, on balance I think the compiler writer's got it right and the error was completely mine.

So what to do? Well, declaring msg_for_bootloader as volatile fixes the problem. Thus my code now looks like this:

__eeprom volatile uint8_t msg_for_bootloader;

...

msg_for_bootloader = 0x42;

...

for(;;)
{
/* Wait for watchdog to generate a reset and force entry in to the bootstrap loader */
}


Thus I ended up in the rather bizarre situation of having to declare a variable as volatile in order to make it persistent!

Although I can appreciate the wry irony of this situation, I think it points to a larger problem. The fact is that we are all (ok, most of us) programming in a language (C) that was not designed for use in embedded systems. Indeed, when C was written, I'm not sure EEPROM even existed. As a result, the compiler vendors have added extensions to the C standard in an effort to overcome its shortcomings for embedded systems, while still desperately striving to achieve "full compliance with the standard". Despite this, I find myself all too frequently falling into traps such as this one. What we really need is a language explicitly designed for embedded systems. It isn't going to happen, but it doesn't stop me wishing for it.

Home

Bookmark and Share

Monday, January 05, 2009

Horner's rule and related thoughts

Recently I was examining some statistical data on the performance of a sensor against temperature. The data were from a number of sensors and I was interested in determining a mathematical model that most closely described the sensors' performance. Using the regression tools built into Excel, I was looking at the various models, from a 'goodness of fit' perspective. After playing around for a while, I came to the conclusion that a quadratic polynomial really was the best fit, and should be the model to adopt. At this point, I turned to the issue of computational efficiency.

Now, it turns out that there is a relatively well known algorithm for evaluating polynomials, called Horner's rule. I say relatively well known, because I'd say about half the time I see a polynomial evaluated, it doesn't use Horner's rule, but instead evaluates the polynomial directly. Thus in an effort to increase the use of Horner's rule, I thought I'd mention it here.

OK, so what is it? Well it's based on simply refactoring a polynomial expression:

anxn + a(n-1)x(n-1) + ... + a0=((anx + a(n-1))x +...)x + a0.


Thus a polynomial of order n, requires exactly n multiplications and n additions.

For example:

23.1x2 - 45.6x + 12.3 = (23.1x -45.6)x + 12.3

In this case a quadratic equation or order 2, using Horner's rule requires 2 multiplications and two additions to evaluate the polynomial, versus the direct approach which requires 5 multiplications and 2 additions.

For those of you that are looking for code to just use, then this snippet will work. This is for a cubic polynomial. COEFFN is the coefficient of xN.

y = x * COEFF3;
y += COEFF2;
y *= x
y += COEFF1;
y *= x
y += COEFF0;

The recurrence relationship for higher order polynomials should be obvious. Note that unlike most implementations, I perform the code in line, rather than using a loop.

It should be noted that as well as being more computationally efficient, Horner's rule is also more accurate. This comes about in two ways:

  • The very act of using less floating point operations leads to less rounding errors
  • Higher order polynomials generate very large numbers in a hurry. Horner's method significantly reduces the magnitude of the intermediate values, thus minimizing problems associated with adding / subtracting floating point numbers that differ in magnitude
Although Horner's rule is a nice tool to have at one's disposal, I think there is a larger point to be made here. Whenever you need to perform any sort of calculation, there is nearly always a superior method than the obvious direct method of evaluation. Sometimes it requires algebraic manipulation such as for Horner's rule. Other times, it's an approximation method, and other times it's just a flat out really neat algorithm (see for example my posting on Crenshaw's square root code). The bottom line. Next time you write code to perform some sort of numerical calculation, take a step back and investigate possibilities other than direct computation. You'll probably be glad you did.

Update

There is a highly relevant addendum to this posting here.

Home

Bookmark and Share

Saturday, December 20, 2008

So you want to be a consultant...

In the lede to this blog, I stated that I'd from time to time be commenting on the trials and tribulations of being a consultant in the embedded systems world. Well, today is my first post on this topic, so I thought I'd address the question I get asked most of the time 'How do you market your business'?

Well, the trite answer is that in general I don't! The bulk of my work comes from repeat clients. I have one client that I've been doing work for for nearly twenty years, another for about seventeen years, and a third for nearly ten years. In short, I'm a very big believer in keeping my existing clients rather than developing new ones all the time. Obviously this isn't very helpful for someone that is thinking about striking out on their own and is wondering how to sign up a client or three.

My main suggestion if this describes you, is to approach previous employers / managers. If you are really good (and it helps a lot if you are) then previous managers will be extremely interested to hear that you are available for consulting work. Why do I say this? Well look at it from their perspective - here is a talented person that knows their products / procedures / tools who is available to come in and help out in overloaded situations. Thus the next time senior management is demanding that something gets done faster, it's an easy sell for your ex-manager to suggest bringing you in to help meet the deadline.

Incidentally, this especially applies to companies that have just had layoffs (even if you were one of those that got cut). When companies have a layoff, they typically overdo it. As a result, important projects grind to a halt and only get moving again when more help is brought in. Now typically for political / legal reasons a company cannot layoff people and then hire different ones. It can however hire 'temporary help' - and that's where you the consultant come in. Thus if you have just been laid off and think it's time to strike out on your own, I strongly suggest that the first person you call to offer your services is the person that laid you off.

Incidentally, I cannot stress enough the importance of face - face or at least voice - voice contact. Sending a card or an email will almost certainly result in the approach going no where. If the thought of 'warm calling' makes you break out in a sweat, then the chances are you just aren't cut out for having your own business.

What about other techniques such as advertising? I have never gone this route but I know people that have with some success. Be warned however that advertising can be expensive and can be too successful. I say this because the only thing worse than not having enough work is having too much!

How important is a good website? Well I used to think it was largely irrelevant (and my website reflects this attitude. I've been promising myself for a year to get it updated). However, I know of several cases where it has been extremely important in bringing in new business. I would caution you though that spending your time and money on a website is no substitute for making the telephone calls.

What about the social networking sites, such as 'Linked In' or 'Plaxo'? These can be helpful if you want to track down all those folks you used to work with who might want to hire you. They are easy to use and low cost / free. Incidentally, don't feel awkward about contacting someone you have lost touch with. Although it might be a little strange socially, it's well worth it to both of you if a fruitful business relationship develops.

Finally, what about the myriad of technical recruiting agencies out there? I have never done any work through them. I have interacted with them, and have found a huge variability in their ethics. Personally, I'd avoid the big companies (which are nothing but key word matchers) and work with the smaller, one man companies. Notwithstanding this, if you're relying on these folks to bring you work then you are being passive rather than proactive. Not recommended!

Next time I post on consulting, I'll address some other important issues. But for now, just remember that a consultant without clients is like a (fill in your own analogy here). Thus the first step in becoming a consultant is getting a client. Only then is the other stuff important.

Follow up to my last post


Thank you to all of you that encouraged others to come and read this blog. I saw a very nice uptick in my readership last week for which I am most grateful.

Home

Bookmark and Share

Saturday, December 13, 2008

Efficient C Tips #5 - Make 'local' functions 'static'

In my humble opinion, one of the biggest mistakes the designers of the 'C' language made, was to make the scope of all functions global by default. In other words, whenever you write a function in 'C', by default any other function in the entire application may call it. To prevent this from happening, you can declare a function as static, thus limiting its scope to typically the module it resides in. Thus a typical declaration looks like this:

static void function_foo(int a)
{
}

Now I'd like to think that the benefits of doing this to code stability are so obvious that everyone would do it as a matter of course. Alas, my experience is that those of us that do this are in a minority. Thus in an effort to persuade more of you to do this, I'd like to give you another reason - it can lead to much more efficient code. To illustrate how this comes about, let's consider a module called adc.c This module contains a number of public functions (i.e. functions designed to be called by the outside world), together with a number of functions that are intended to be called only by functions within adc.c. Our module might look something like this:

void adc_Process(void)
{
  ...
  fna();
  ...
  fnb(3);
}

...

void fna(void)
{
  ...
}

void fnb(uint8_t foo)
{
  ...
}


At compile time, the compiler will treat fna() and fnb() like any other function. Furthermore, the linker may link them 'miles' away from adc_Process(). However, if you declare fna() and fnb() as 'static', then something magical happens. The code would now look like this:


static void fna(void);
static void fnb(uint8_t foo);

void adc_Process(void)
{
  ...
  fna();
  ...
  fnb(3);
}

...

static void fna(void)
{
  ...
}

static void fnb(uint8_t foo)
{
  ...
}


In this case, the compiler will know all the possible callers of fna() and fnb(). With this information to hand, the compiler / linker will potentially do all of the following:

  • Inline the functions, thus avoiding the overhead of a function call.
  • Locate the static functions close to the callers such that a 'short' call or jump may be performed rather than a 'long' call or jump.
  • Look at registers used by the local functions and thus only stack the required scratch registers rather than stacking all of the registers required by the compiler's calling convention

Together these can add up to a significant reduction in code size and a commensurate increase in execution speed.

Thus making all non public functions not only makes for better code quality, it also leads to more compact and faster code. A true win-win situation! Thus if you are not already doing this religiously, I suggest you go through your code and do it now. I guarantee you'll be very pleased with the results.

Next Tip
Previous Tip

A Request ...


If I'm to believe the statistics for this blog, it appears that I'm gradually building a decent sized readership. Furthermore many of you choose to come back and read the latest postings which tells me that I'm doing something of value. Anyway, if this describes you, I'd be obliged if you'd encourage your colleagues to read the blog and also to post comments / questions. Why do I ask this? Well, an increased readership has several benefits, for both me and you the readers.

  • I believe quite passionately about improving the quality of embedded systems. Those of us that are working in this field collectively have an enormous impact on the world. Thus anything that helps improve the quality of embedded systems in turn helps improve the world. (I appreciate that this is a little melodramatic. It is, however, true).
  • Writing about something is the best way to I know to find out if I truly understand it. Thus, the very act of publishing a blog causes me to improve my skills and knowledge.
  • Some of the (too few) comments I get are quite profound and often instructive. Thus I also learn in this way.
  • The bigger the readership I have, the more inclined I am to publish. If I'm publishing things of value, then presumably the readers benefit.

Anyway, if you concur, then please encourage your colleagues. If you don't, then that's OK as well.

Thanks for reading.

Home

Bookmark and Share

Saturday, December 06, 2008

Knowing my weaknesses

A few weeks ago I published what appears to have been quite a popular blog on what I called the 'Bug Cluster Phenomenon'. Today, I'm going to extend that concept somewhat by way of a mea culpa.

Earlier this week I had to eat some very humble pie. For the last six weeks or so I had received complaints that a temperature measurement wasn't giving accurate results. The sensor in question is measuring approximately ambient temperature, and was returning values in the 18 - 26 Celsius range, which seemed reasonable to me. I just wrote off the complaints as being due to the fact that humans have a very poor perception of absolute temperature. Well finally, at my urging, someone dragged the device out into the Winter cold, where it promptly read 18 Celsius. Thus I was faced with proof that something was wrong.

I proceeded to investigate the code, and discovered that based on the current inputs to the code, the code was generating an output with an error of about 2 degrees. How was this possible, since it was nothing more than a series of multiplies, adds and shifts - not typically fodder for a 2 degree error?

Well, further investigation showed that at a certain point I was getting numeric overflow when two numbers were being multiplied together. Now typically, when this occurs, one gets answers that have huge 'errors'. In my case I had the misfortune that the arithmetic worked out such that the error at room temperature was barely noticeable.

Anyway, I duly fixed the code. However, before moving on I took the time to reflect on this particular bug. Was this just one of those stupid coding errors that we all make from time to time, or was there more to it? I came to the conclusion that this was not just "one of those things". Rather I realized that this was at least the third time this year that I had written code that suffered from a numeric overflow problem. In short, I have a problem or a blind spot if you will, for a particular class of problem.

Well I'm told that recognizing ones problems is the first step in solving them. So I proceeded to do a little bit more investigating and discovered that my numeric overflow bugs always occurred when I combined multiple operators on a line. For example:

y = a * a + c;

Thus the solution seems obvious to me - only one numeric operator per line. Thus in future, I will always code like this:

y = a * a;
y += c;


The bottom line. When you encounter a bug, as well as looking for other bugs nearby (as described in the bug cluster phenomenon post), also take the time to reflect on what caused the bug in the first place, and see if you can recognize any systemic problems in your approach to coding. When it comes down to it, this is nothing more than a process of 'continuous quality improvement'. If it works for Toyota then it might just work in the embedded systems arena.

Home

Bookmark and Share

Sunday, November 30, 2008

Modulo Means (reprised)

In my previous post I had asked for some input on how to compute the mean of a phase comparator. Bruno Santiago suggested converting the phase readings to their cartesian co-ordinates and averaging the resulting (X, Y) data, and then converting the means of X & Y back into a phase angle. Well kudos to Bruno because this is exactly what I ended up doing. However, as Bruno observed, it's not exactly an efficient process. It is however robust, and in my application, the robustness counts for a lot.

The suggestion that I average the inputs to the phase comparator has its merits. However for reasons that would take too long to explain, I'm not really able to do this in my application.

Finally, I'd like to mention the second solution that Kyle had proposed. First a caveat. I haven't fully thought through this solution, and I most certainly have not implemented and tested it. With that in mind, here's another approach to contemplate.

You'll remember that we can compute the average of the phase angle by using the simple arithmetic mean, provided that we do not cross back and fore across the zero phase line. Well Kyle's insight was that as well as computing the arithmetic mean of the phase angle, we also do the same for the quadrature angle. The idea is that while it is possible that the phase could alternate across the zero degree line, it would not simultaneously alternate across the 90 degree line (or indeed the 180 degree line). Thus, the method then becomes one of computing two means and choosing the correct one. If I get the time I'll develop this into a fully fledged algorithm and publish it for you all to, ahem, enjoy. I'm fairly sure that this method is not as robust as the cartesian method. However, it is dramatically more efficient and thus is deserving of greater investigation. Bruno - perhaps you'd care to do the analysis in your CFT (Copious Free Time)?

Home

Bookmark and Share

Friday, November 21, 2008

Modulo means

Normally on this blog I'm either giving my opinions on embedded matters, or offering tips on how to do things better. Well today I'm turning the tables, as I'd like your help. Yesterday I ran into a rather perplexing problem, which I'd be interested to see if any of my readers can solve.

In a product I am working on, there is a phase comparator generating difference readings in the range 0 - 0xF. The phase comparator is somewhat noisy and so I want to obtain a moving average of the phase differences. Now typically to perform a moving average filter, one sums the elements in a buffer and divides by the number of elements to obtain the arithmetic mean. Indeed we can do this here, provided that we don't flip back and fore across the zero line. If we do cross the zero line then the method breaks down. For example, if successive phase differences are 0, F, 0, F, 0, F .... 0, F, then the simple arithmetic mean of these numbers will be 8 instead of some value between F and 0.

You may think that the answer is to switch to signed arithmetic and operate over the range -8 ... +7. However, a little thought will show that you have now merely shifted the problem as to what happens when the system is close to -8 such that the values alternate between -8, 7, -8, 7 ... -8, 7.

Thus, can you come up with a robust, efficient solution to compute the mean of an array of modulo numbers?

The problem is solvable as one of the Engineers that I'm working with hit upon not one, but two possible solutions (nice work Kyle). However, I'd be interested in other possible approaches.

I'll publish Kyle's method(s) next week.

Home

Bookmark and Share

Tuesday, November 04, 2008

Dogging your watchdog

Most embedded systems employ watchdog timers. It's not my intention today to talk about why to use watchdog timers, or indeed how to use them. Rather I assume you know the answers to these questions. Instead, I'll pass on some tips for how to track down those unexpected watchdog resets that can occur during the development process.

To help find these problems, it is essential to find out where the watchdog reset is occurring. Unfortunately, this isn't easy, since by definition a watchdog reset will reset the processor, typically destroying all state information that could be used to debug the problem. To get around this problem, here are a few things you can try.
  1. Place a break point on the (watchdog) reset vector. Although this will typically not stop the processor from being reset, it will ensure that none of your variables get initialized by your start up code. As a result, you should be able to use your debugger to examine these variables - which may give you an insight into what is going wrong.
  2. Certain processor architectures allow the action of the watchdog timer to be changed between a classic watchdog (when the timer times out, the processor is reset), to a special form of timer, complete with its own interrupt vector. Although I rarely use this mode of operation in release code, it is very useful for debugging. Simply reconfigure the watchdog to generate an interrupt upon timeout, and place a break point in the watchdog's ISR. Then when the watchdog times out, your debugger will stop at the break point. It's then just a simple matter of stepping out of the ISR to return to the exact point in your code where the watchdog timeout occurred.
  3. If neither of the above methods are available to you, and you are genuinely clueless as to where to start looking, then a painful but workable solution is to 'instrument' entry into each function. This essentially consists of some code that is placed at the start of every function. The code's job is to record the ID of the function into some form of storage that will not be affected by a watchdog reset, such that you can identify the offending function after a watchdog reset has occurred. This isn't quite as bad as it sounds, provided you are good with macros, a scripting language such as Perl and are aware of common compiler vendor extensions such as the macro __FUNCTION__. Of course if you are that good the chances are you won't be clueless as to why you are taking a watchdog reset!
I'll leave it to another post to talk about the sort of code that often causes watchdog timeouts.

Home

Bookmark and Share

Wednesday, October 08, 2008

Bug cluster phenomenon

I was debugging a piece of code recently when I realized that there was a scenario, albeit unlikely, in which a divide by zero could occur. Rather than just fix the bug and move on, I invoked what I call the "bug cluster phenomenon" rule. What you may ask is this rule? Well it has two variants. The first is as follows:

"Where there is one bug, there is usually another". I've observed this phenomenon over many years. What seems to happen is that when I (or anyone else for that matter) is generating a block of code, I get interrupted, or I'm tired or my focus is elsewhere. As a result, when I create one bug, I usually create several others while I am at it. Thus when I find a bug in a function, I always assume that it has company near bye. In short, finding a bug in a function always triggers a top to bottom review of that function and its neighbors. This has dramatically reduced my debugging time over the years - and I strongly recommend you adopt it.

The second variant of the rule is as follows:

"Logical errors normally have company". I've also observed this phenomenon over many years. In this case, it seems that if you have made a particular error in logic in one place in the code, the chances are you have made the same error elsewhere. In the case of the divide by zero issue mentioned in the introduction, this prompted me to wonder if I had any other possible divide by zero errors lurking in my code. As a result, I performed a search through the entire project - and sure enough I found a few other cases where there existed the possibility of a divide by zero error. Thus finding one bug caused me to fix several. That's efficient debugging!

Incidentally, I was able to quickly find all the divisions in my code because I am absolutely anal about having a space on either side of an operator. Thus, I needed to search for only two strings - " / " and " /= ". I've observed that many people are lackadaisical about this, such that you'll often see expressions such as "y=a/b". These people have no option other than to search either for just "/" - which of course returns every line with a comment, or they have to construct a more sophisticated regular expression search - which again takes time and is error prone.

Thus I have three pieces of advice to pass on:
1. When you find a bug, look nearby for more.
2. If the bug was of a particular class of bug, then search your code to see if you had made the same mistake elsewhere.
3. Write your code so that it is trivial to search for certain constructs. It will save you time in the long run.

Home

Bookmark and Share

Monday, September 08, 2008

Efficient C Tips #4 - Use Speed Optimization

Back in July 2008 I promised that the next blog post would be on why you should use speed optimization instead of size optimization. Well four other posts somehow got in the way - for which I apologize. Anyway, onto the post!

In "Efficient C Tips #2" I made the case for always using full optimization on your released code. Back when I was a lad, the conventional wisdom when it came to optimization was to use the following algorithm:

1. Use size optimization by default
2. For those few pieces of code that get executed the most, use speed optimization.

This algorithm was based on the common observation that most code is executed infrequently and so in the grand scheme of things its execution time is irrelevant. Furthermore since memory is constrained and expensive, this code that is rarely executed should consume as little resource (i.e. memory) as possible. On first blush, this approach seems reasonable. However IMHO it was flawed back then and is definitely flawed now. Here is why:

1. In an embedded system, you typically are not sharing memory with other applications (unlike on a general purpose computer). Thus there are no prizes for using less than the available memory. Of course, if by using size optimization you can fit the application into a smaller memory device then use size optimization and use the smaller and cheaper part. However in my experience this rarely happens. Instead typically you have a system that comes with say 32K, 64K or 128K of Flash. If your application consumes 50K with speed optimization and 40K with size optimization, then you'll still be using the 64K part and so size optimization has bought you nothing. Conversely, speed optimization will also cost you nothing - but your code will presumably run faster, and consume less power.

2. In an interesting quirk of optimization technology, it turns out that in some cases speed optimization can result in a smaller image than size optimization! It is almost never the case that the converse is true. See however this article that I wrote which discusses one possible exception. Thus even if you are memory constrained, try speed optimization.

3. Size optimization comes with a potentially very big downside. After a compiler has done all the usual optimizations (constant folding, strength reduction etc), a compiler that is set up to do size optimization will usually perform "common sub-expression elimination". What this consists of is looking at the object code and identifying small blocks of assembly language that are used repeatedly throughout the application. These "common sub-expressions" are converted into sub routines. This process can be repeated ad nauseum such that one subroutine calls another which calls another and so on. As a result an innocuous looking piece of C code can be translated into a call tree that nests many levels deep - and there is the rub. Although this technique can dramatically reduce code size it comes at the price of increasing the call stack depth. Thus code that runs fine in debug mode may well suffer from a call stack overflow when you turn on size optimization. Speed optimization will not do this to you!

4. As I mentioned in "Efficient C Tips #2" one downside of optimization is that it can rearrange instruction sequences such that the special access requirements often needed by watchdogs, EEPROM etc are violated. In my experience, this only happens when one uses size optimization - and never with speed optimization. Note that I don't advocate relying on this; it is however a bonus if you have forgotten to follow the advice I give in "Efficient C Tips #2" for these cases.

The bottom line - speed optimization is superior to size optimization. Now I just have to get the compiler vendors to select speed optimization by default!

Next Tip Previous Tip
Home

Bookmark and Share

Thursday, September 04, 2008

Low cost tools

Like many of you, I subscribe to Jack Ganssle's newsletter (If you don't then you should - go to http://ganssle.com/). In his latest newsletter #164 (alas not yet posted to the web) there is a thread on tools for monitoring serial protocols such as I2C. I was quite interested in this because it so happens I use some of the tools mentioned. What really struck me though was the fact that someone was looking for low cost tools.

I'm always baffled when I see this. If I believe the salary surveys, most engineers in the USA are earning well over $100K. Throw in benefits and your average engineer costs his / her employer about $200K a year, or close to $100 per working hour. Why then do employer's balk at spending a few thousand dollars on a decent tool? I've seen people spend days on compiler problems because they are using a "free" tool; I've had people tell me that they don't use Lint because it's too expensive (<$200!); I've seen people struggle for days simply because their oscilloscope isn't up to the job. In all these cases, the cost in terms of their time dwarfs the equipment / tool cost.

What I want are great tools. I want tools that are intuitive to use, that work really well, are tolerant of my occasional ham-fistedness and that I trust. For example, I have a Fluke 87 multimeter sitting next to me. It costs quadruple what a Radio Shack special costs. It's worth every penny.

Here's an ending thought. You are going in for open heart surgery. The surgeon comes out and says "don't worry - I've got some great low cost tools to use on you". And we wonder why engineers don't get the respect that doctors do.

Home

Bookmark and Share

Tuesday, August 12, 2008

Have you looked at your linker output file recently?

Of all the myriad of files involved in a typical embedded firmware project, probably the two most feared (and yes I do mean feared) are the linker control file (which tells the linker how to link your application) and the linker output file. Today it's the latter which I'll be talking about.

The linker output file tells you a myriad of information about the way your application has been put together. Unfortunately, much of it is in such a cryptic format that examination of the file is a painful process. Indeed, for this reason, I suspect that most projects are completed with nothing more than a cursory look at this file.

This is a shame, because examination of the linker output file can significantly reduce your debugging time. To show you what I mean, consider my typical action sequence when I first start coding up a project.

1. Write a module.
2. Compile module and correct all errors and warnings.
3. Lint module and correct all complaints from Lint.
4. Repeat steps 1, 2 & 3 until I have sufficient modules to be able to generate a linkable image.
5. Link image and repeat steps 1-4 until the linker has no warnings or errors.
6. Examine the linker output file.

I'd wager that most developers out there would be reaching for the debugger in step 6. The reason I do not, is because I can typically find some bugs simply by looking at the linker output. For example, consider this code sequence:


if (0 == var)
{
 function_a();
} else if (1 == var)
{
 function_b();
}
else if (2 == var)
{
 function_b();
{
else
{
 function_d();
}


I make these sort of copy and paste errors all the time. In this case, when var is 2, I meant to call function_c but inadvertently I ended up calling function_b again. Since function_b exists, the compiler is happy and so there are typically no warnings.

So how does looking at the linker output file help me in this case? Well, if you have a decent linker it will give you a list of all the functions that aren't called and that consequently have been stripped out of the final image. If in perusing this list I see that function_c() is listed as uncalled, then I immediately know I've got a bug somewhere. Typically tracking it down is very easy.

I'll leave for another day the other ways I use the linker output file to debug code.

Home

Bookmark and Share

Thursday, August 07, 2008

Improvements versus Features

I'm taking a slight detour from my usual topics to blather about what I see as an unfortunate trend that is making its way from the PC world to the embedded world. My perception is that as more embedded systems get sophisticated user interfaces, the desire to add features seems inescapable. While I don't see adding features as bad, per se, doing so instead of improving the product is a bad thing. What do I mean by improving the product? Well, typically those things that most users don't understand, for example noise floors, power consumption, SNR, software reliability and so on.

In the days before user interfaces, pretty much the only way to improve a product was to work on the "invisible" parameters. Today, it's often far easier to add a new feature than it is to labor at, for example, wringing a few more db of performance out of that digital filter while keeping the number of clock cycles unchanged.

Am I tilting at windmills? I don't think so. Is my plea pointless - probably. However the next time someone comes along asking for a YANF (Yet Another New Feature), do them and you a favor and ask how time spent on the YANF compares to time spent on improving the product.

Home

Bookmark and Share

Friday, August 01, 2008

Efficient C Tips #3 - Avoiding post increment / decrement

It always seems counter intuitive to me, but post increment / decrement operations in C / C++ often result in inefficient code, particularly when de-referencing pointers. For example


for (i = 0, ptr = buffer; i < 8; i++)
{
*ptr++ = i;
}


This code snippet contains two post increment operations. With most compilers, you'll get better code quality by re-writing it like this:


for (i = 0, ptr = buffer; i < 8; ++i)
{
*ptr = i;
++ptr;
}


Why is this you ask? Well, the best explanation I've come across to date is this one on the IAR website:

Certainly taking the time to understand what's going on is worthwhile. However, if it makes your head hurt then just remember to avoid post increment / decrement operations.

Incidentally, you may find that on your particular target it makes no difference. However, this is purely a result of the fact that your target processor directly supports the required addressing modes to make post increments efficient. If you are interested in writing code that is universally efficient, then avoid the use of post increment / decrement.

You may also wonder just how much this saves you. I've run some tests on various compilers / targets and have found that this coding style cuts the object code size down from zero to several percent. I've never seen it increase the code size. More to the point, in loops, using a pre-increment can save you a load / store operation per increment per loop iteration. These can add up to some serious time savings.

Next Tip Previous Tip
Home

Bookmark and Share

Saturday, July 05, 2008

Efficient C Tips #2 - Using the optimizer

In my first post on "Efficient C" I talked about how to use the optimal integer data type to achieve the best possible performance. In this post, I'll talk about using the code optimization settings in your compiler to achieve further performance gains.

I assume that if you are reading this, then you are aware that compilers have optimization settings or switches. Invoking these settings usually has a dramatic effect on the size and speed of the compiled image. Typical results that I have observed over the years is a 40% reduction in code size and a halving of execution time for fully optimized versus non-optimized code. Despite these amazing numbers, I'd say about half of the code that I see (and I see a lot) is released to the field without full optimization turned on. When I ask developers about this, I typically get one of the following explanations:

1. I forgot to turn the optimizer on.
2. The code works fine as is, so why bother optimizing it?
3. When I turned the optimizer on, the code stopped working.

The first answer is symptomatic of a developer that is just careless. I can guarantee that the released code will have a lot of problems!

The second answer on the face of it has some merit. It's the classic "if it aint broke don't fix it" argument. However, notwithstanding that it means that your code will take longer to execute and thus almost certainly consume more energy (see my previous post on "Embedded Systems and the Environment"), it also means that there are potential problems lurking in your code. I address this issue below.

The third answer is of course the most interesting. You have a "perfectly good" piece of code that is functioning just fine, yet when you turn the optimizer on, the code stops working. Whenever this happens, the developer blames the "stupid compiler" and moves on. Well, after having this happen to me a fair number of times over my career, I'd say that the chances that the compiler is to blame are less than 1 in 10. The real culprit is normally the developer's poor understanding of the rules of the programming language and how compilers work.

Typically when a compiler is set up to do no optimization, it generates object code for each line of source code in the order in which the code is encountered and then simply stitches the result together (for the compiler aficionados out there I know it's more involved than this - but it serves my point). As a result, code is executed in the order in which you write it, constants are tested to see if they have changed, variables are stored to memory and then immediately loaded back into registers, invariant code is repeatedly executed within loops, all the registers in the CPU are stacked in an ISR and so on.

Now, when the optimizer is turned on, the optimizer rearranges code execution order, looks for constant expressions, redundant stores, common sub-expressions, unused registers and so on and eliminates everything that it perceives to be unnecessary. And therein dear reader lies the source of most of the problems. What the compiler perceives as unnecessary, the coder thinks is essential - and indeed is relying upon the "unnecessary" code to be executed.

So what's to be done about this? Firstly, you have to understand what the key word volatile means and does. Even if you think you understand volatile, go and read this article I wrote a number of years back for Embedded Systems Programming magazine. I'd say that well over half of the optimization problems out there relate to failure to use volatile correctly.

The second problematic area concerns specialized protective hardware such as watchdogs. In an effort to make inadvertent modification of certain registers less likely, the CPU manufacturers insist upon a certain set of instructions being executed in order within a certain time. An optimizer can often break these specialized sequences. In which case, the best bet is to put the specialized sequences into their own function and then use the appropriate #pragma directive to disable optimization of that function.

Now what to do if you are absolutely sure that you are using volatile appropriately and correctly and that specialized coding sequences have been protected as suggested, yet your code still does not work when the optimizer is turned on? The next thing to look for are software timing sequences, either explicit or implicit. The explicit timing sequences are things such as software delay loops, and are easy to spot. The implicit ones are a bit tougher and typically arise when you are doing something like bit-banging a peripheral, where the instruction cycle time implicitly acts as a setup or hold time for the hardware being addressed.

OK, what if you've checked for software timing and things still don't work? In my experience you are now in to what I'll call the "Suspect Code / Suspect Compiler (SCSC)" environment. With an SCSC problem, the chances are you've written some very complex, convoluted code. With this type of code, two things can happen:

1. You are working in a grey area of the language (i.e. an area where the behavior is not well specified by the standard). Your best defense against this is to use Lint from Gimpel. Lint will find all your questionable coding constructs. Once you have fixed them, you'll probably find your optimization problems have gone away.
2. The optimizer is genuinely getting confused. Although this is regrettable, the real blame may lie with you for writing knarly code. The bottom line in my experience is that optimizers work best on simple code. Of course, if you have written simple code and the optimizer is getting it wrong, then do everyone a favor and report it to the compiler vendor.

In my next post I'll take on the size / speed dichotomy and make the case for using speed rather than size as the "usual" optimization method.

Next Tip Previous Tip

Home

Bookmark and Share

Friday, June 20, 2008

Embedded Systems and the Environment

With the recent run up in the price of oil, it seems as if everyone is talking about energy and how to conserve it. For most people, the only impact they can have on the environment is through their own individual actions and choices. Engineers however, are in a different position because at a professional level, the design choices we make can have a profound effect on the environment. If we believe the figures about the number of embedded processors shipped each year (billions) and we make the very conservative estimate that each processor is in a system that consumes 1 WH per day, then the annual energy consumption of new embedded systems runs to at least 1E9 * 1 * 365 = 365 Tera Watt hours, with an average power consumption of around 41 Megawatts. If we assume that the average life of an embedded system is 5 years, then the embedded systems out there are burning about 200 Megawatts. That's a lot of power folks.

Now here's the interesting thing. Most embedded projects are for products that are made in the thousands. Individually, these products power consumption is irrelevant. Collectively they are huge. Thus if as an industry we made a concerted effort to reduce the power consumption of our products, the benefits to society would be substantial. So how exactly do we do this? Although a lot of the power consumption comes from the hardware design, the firmware design can also have a dramatic impact on the overall power consumption of the system. In my next posting I'll look at some of the ways you can design your system firmware so as to minimize power consumption.

Bookmark and Share

Sunday, June 15, 2008

Efficient C Tips #1 - Choosing the correct integer size

From time to time I write articles for Embedded Systems Design magazine. A number of these articles have concentrated on how to write efficient C for an embedded target. Whenever I write these articles I always get emails from people asking me two questions:

1. How did you learn this stuff?
2. Is there somewhere I can go to learn more?

The answer to the first question is a bit long winded and consists of:
1. I read compiler manuals (yes, I do need a life).
2. I experiment.
3. Whenever I see a strange coding construct, I ask the author why they are doing it that way. From time to time I pick up some gems.
4. I think hard about what the compiler has to do in order to satisfy a particular coding construct. It's really helpful if you know assembly language for this stage.

The answer to the second question is short: No!

To help rectify this, I'll shortly be offering a one day course on how to write efficient C for embedded systems, details of which will be posted on my website soon.

In the interim, I'd like to offer up my first tip on how to choose the correct integer size.

In my experience in writing programs for both embedded systems and computers, I'd say that greater than 95% of all the integers used by those programs could fit into an 8 bit variable. The question is, what sort of integer should one use in order to make the code the most efficient? Most computer programmers who use C will be puzzled by this question. After all the data type 'int' is supposed to be an integer type that is at least 16 bits that represents the natural word length of the target system. Thus, one should simply use the 'int' data type.

In the embedded world, however, such a trite answer will quickly get you into trouble - for at least three reasons.
1. For 8 bit microcontrollers, the natural word length is 8 bits. However you can't represent an 'int' data type in 8 bits and remain C99 compliant. Some compiler manufacturer's eschew C99 compliance and make the 'int' type 8 bits (at least one PIC compiler does this), while others simply say we are compliant and if you are stupid enough to use an 'int' when another data type makes more sense then that's your problem.
2. For some processors there is a difference between the natural word length of the CPU and the natural word length of the (external) memory bus. Thus the optimal integer type can actually depend upon where it is stored.
3. The 'int' data type is signed. Much, indeed most, of the embedded world is unsigned, and those of us that have worked in it for a long time have found that working with unsigned integers is a lot faster and a lot safer than working with signed integers, or even worse a mix of signed and unsigned integers. (I'll make this the subject of another blog post).

Thus the bottom line is that using the 'int' data type can get you into a world of trouble. Most embedded programmers are aware of this, which is why when you look at embedded code, you'll see a veritable maelstrom of user defined data types such as UINT8, INT32, WORD, DWORD etc. Although these should ensure that there is no ambiguity about the data type being used for a particular construct, it still doesn't solve the problem about whether the data type is optimal or not. For example, consider the following simple code fragment for doing something 100 times:


TBD_DATATYPE i;

for (i = 0; i < 100; i++)
{
// Do something 100 times
}


Please ignore all other issues other than what data type should the loop variable 'i' be?

Well evidently, it needs to be at least 8 bits wide and so we would appear to have a choice of 8,16,32 or even 64 bits as our underlying data type. Now if you are writing code for a particular CPU then you should know whether it is an 8, 16, 32 or 64 bit CPU and thus you could make your choice based on this factor alone. However, is a 16 bit integer always the best choice for a particular 16 bit CPU? And what about if you are trying to write portable code that is supposed to be used on a plethora of targets? Finally, what exactly do we mean by 'optimal' or 'efficient' code?

I wrestled with these problems for many years before finally realizing that the C99 standards committee has solved this problem for us. Quite a few people now know that the C99 standard standardized the naming conventions for specific integer types (int8_t, uint8_t, int16_t etc). What isn't so well known is that they also defined data types which are "minimum width" and also "fastest width". To see if your compiler is C99 compliant, open up stdint.h. If it is compliant, as well as the uint8_t etc data types, you'll also see at least two other sections - minimum width types and fastest minimum width types. An example will help clarify the situation:

Fixed width unsigned 8 bit integer: uint8_t
Minimum width unsigned 8 bit integer: uint_least8_t
Fastest minimum width unsigned 8 bit integer: uint_fast8_t

Thus a uint8_t is guaranteed to be exactly 8 bits wide.
A uint_least8_t is the smallest integer guaranteed to be at least 8 bits wide.
An uint_fast8_t is the fastest integer guaranteed to be at least 8 bits wide.

So we can now finally answer our question. If we are trying to consume the minimum amount of data memory, then our TBD_DATATYPE should be uint_least8_t. If we are trying to make our code run as fast as possible then we should use uint_fast8_t.

Thus the bottom line is this. If you want to start writing efficient, portable embedded code, the first step you should take is start using the C99 data types 'least' and 'fast'. If your compiler isn't C99 compliant then complain until it is - or change vendors.

If you make this change I think you'll be pleasantly surprised at the improvements in code size and speed that you'll achieve.

Next Tip

Bookmark and Share

Friday, June 06, 2008

Thoughts on the optimal time to test code

Today I'd like to take on one of the sacred cows of the embedded industry, namely the temporal relationship between coding and testing of the aforementioned code. The conventional wisdom seems to be as follows.

"Write a small piece of code. As soon as possible test the code. Repeat until the task is complete"


I know for many of you, me merely having the temerity to suggest this might be sub-optimal will put me firmly into the category of hopeless heretic. Well, before you write me off as a lunatic, let me tell you about an alternative approach, how I stumbled upon it and why I think it has much to commend it.

Being in the consulting business I'm typically working on multiple projects at once. Often a given project will be put on hold for any number of reasons which aren't germane to this post. As a result, it's not uncommon for me to write some code, compile it and then not touch it again for several months. I then find myself in the position of having to test / debug code that I wrote months ago. Having now done this many times, I've come to the conclusion that rather than this being a problem, it is instead the optimal temporal relationship between coding and testing.

How can this be you ask? Surely after a multi-month hiatus, the code is no longer fresh in your mind and so it must make it that much more difficult to test and debug? Well the answer is of course yes - the code is no longer fresh in my mind, and yes it does make it a little harder to test and debug in the short term. In my emphasis lies the point of my argument.

Why do we write code? Most people would claim we write code in order to make a functional product. I disagree with this assertion. I think we write code so that people coming after us can understand it and modify it. This rather strange claim is based upon those studies that show that companies spend far more money maintaining code than they do writing it. Thus the smart way to write code is to do so in a manner that gives preeminent importance to the long term maintenance of that code. So how does one do this? Well that's a topic for another post. What I can tell you, is that having to test and debug code that you wrote several months ago is a terrific way for the developer of the code to see the code as someone who'll be maintaining it will see it. You'll see the inadequate or plain wrong comments. You'll see the copy and paste errors. You'll see where you got tired and took a short cut, and you'll see those stupid mistakes caused by the telephone ringing at the wrong time.

Indeed because you don't expect the code to work (after all it's never been tested) I find you cast a very jaundiced eye over the code - and in the process find a plethora of the mistakes that one typically finds by sitting in front of a debugger. Maybe it's just me, but I'd rather find bugs via code inspection than by fighting the debug environments common to most embedded systems.

So in a nutshell, I think the optimal way to write and test code is as follows:

1. Write the code. Make sure it compiles and is Lint free.
2. Wait a few months.
3. Reread the code looking for the usual suspects of bad / wrong comments, copy and paste errors, sloppy coding etc.
4. Test it.

The person that maintains your code (quite likely a future version of you) will thank you for doing it this way.

Home

Bookmark and Share

Tuesday, May 20, 2008

visualSTATE

I have been writing this blog now for about 18 months and in reviewing my posts I've noticed that my posts are often critical of technologies, manufacturers and or products. Well today is a first for me, because I'd like to offer my first product endorsement. The endorsement goes to visualSTATE from IAR . I've been using this product for about the same length of time I've had this blog and have concluded that it represents the biggest step forward in productivity for me since I made the move from assembly language to C. (Yes folks, the move from C to C++ was a virtual non-event for me, as I found almost no improvement in my productivity, mainly I suspect because I have written for years in object oriented C).

Anyway, back to the topic of visualSTATE. If you aren't familiar with it, then you should be. It allows you to design complex, hierarchical state machines with ease and to push a button and obtain code that just seems to work. I have now completed three projects using this tool and am well on the way to finishing a fourth. In all cases, the boost to my productivity has been astonishing. I find that I spend most of my time on the functional design and almost no time on debugging the high level application.

visualSTATE's main strengths seem to be in the following areas:

1. Products that are highly modal - i.e. a product can be in one of N operating modes depending upon circumstances..
2. User interfaces. I've had great success with products that contain bespoke LCD and membrane keypads.
3. Products that contain complex sequencing requirements, particularly when coupled with a plethora of failure modes that have to be handled.

I've found the learning curve on visualSTATE to be quite long - but definetly worth it. Although you can certainly be up and running in a day or so, I found that it took me a lot longer to work out how best to partition a problem between visualSTATE and traditional code. However, with experience I'm now finding that I rarely get it wrong anymore.

I've also found some very nice and unexpected benefits from visualSTATE. To wit:

1. Code reuse. visualSTATE does of course require some code support. However, I've found that a lot of this code can be reused. As a result, I can now bring up a new board with a visualSTATE processing engine running on it in a matter of hours. Try doing that with your average RTOS.
2. Although we all know that lots of small functions are "better" than a few big functions, human nature being what it is, we tend to just expand an existing function rather than decomposing it into its constituent parts. Well when using visualSTATE I find that it almost forces one in to writing lots of small (less than 5 lines) functions. I suspect that these small functions are part of the reason that my visualSTATE projects just seem to work with almost no debugging time.
3. Documentation. As well as the documentation benefits associated with small functions (i.e. the comments actually match the code!), visualSTATE comes with a terrific documentation tool. Many of my clients quite rightly demand excellent documentation on the designs I do for them. The documentation engine in visualSTATE makes this a breeze!
4. Communication. My clients often ask questions such as "what does the code do if ...". In a traditional project this usually means pouring through complex code trying to ascertain the answer. With visualSTATE projects I find that most of the time I simply look at the state charts. Since the state charts are effectively the code (since they are tied together), then I can give an answer quickly and authoritatively - which makes my clients happy and helps assure me of future business.

All in all, kudos to IAR for such a great tool.

Home

Bookmark and Share

Sunday, May 11, 2008

Integer Log functions

A few months ago I wrote about a very nifty square root function in Jack Crenshaw's book "Math Toolkit for Real-time Programming". As elegant as the square root function is, it pails in comparison to what Crenshaw calls his 'bitlog' function. This is some code that computes the log (to base 2 of course) of an integer - and does it in amazingly few cycles and with amazing accuracy. The code in the book is for a 32 bit integer; the code I present here is for a 16 bit integer. Although you are of course free to use this code as is, I strongly suggest you buy Crenshaw's book and read about this function. You'll see it truly is a work of art. BTW, one of the things I really like about Crenshaw is that he takes great pains to note that he didn't invent this algorithm. Rather he credits Tom Lehman. Kudos to Lehman.


/**
FUNCTION: bitlog

DESCRIPTION:
Computes 8 * (log(base 2)(x) -1).

PARAMETERS:
- The uint16_t value whose log we desire

RETURNS:
- An approximation to log(x)

NOTES:
-

**/
uint16_t bitlog(uint16_t x)
{
uint8_t b;
uint16_t res;

if (x <= 8) /* Shorten computation for small numbers */
{
res = 2 * x;
}
else
{
b = 15; /* Find the highest non zero bit in the input argument */
while ((b > 2) && ((int16_t)x > 0))
{
--b;
x <<= 1;
}
x &= 0x7000;
x >>= 12;

res = x + 8 * (b - 1);
}

return res;
}


Home

Bookmark and Share

Saturday, April 12, 2008

IEC60730

Atmel has a very interesting application note on IEC60730 Class B compliance. If you aren't aware of IEC60730, there is a nice introduction here. In a nutshell IEC60730 Class B compliance is a safety standard related to household appliances. Part of IEC60730 requires that one actively monitor that a microcontroller (if one is used) is functioning correctly. This seems to be a reasonable thing to do. However, as the Atmel application note shows, meeting this requirement requires one to constantly do things such as test memory, confirm that timers are operating at the correct frequencies and so on. Again conceptually this doesn't seem unreasonable. However, my concern with this is that the very act of confirming that the hardware is functioning could result in a system failure at a critical point, thus creating the very problem the standard is designed to prevent.

For example, it's hard to argue with the contention that the stack is the most used portion of memory in most microcontrollers. I think most engineers would agree that if the memory used for the stack malfunctioned then disastrous things would most likely occur. On this basis, a regular check of the Stack memory would seem to be in order. Maybe it's just me, but the thought of running a memory test on the stack area of a processor while simultaneously trying to respond to interrupts etc seems like a very tall order. Indeed, I can easily envisage a piece of code that is designed to test the stack area malfunctioning and causing a system crash and potentially causing the very thing it's designed to avoid.

I think what it comes down to is this. The reliability of hardware seems to me to be several orders of magnitude better than the reliability of software. Thus using software to validate hardware seems problematic. I'll be very interested to see what happens the first time someone gets hurt as a result of a malfunction in software written to conform to IEC60730. If you don't think this is likely, take a look at the size of the object code produced by Atmel's suggested tests. Then consider that many household appliances use microcontrollers that contain just a few kbytes of object code - and that the IEC60730 code will thus make up a very large fraction of the delivered code. On a simplistic statistical basis, we can assume that if 30% of the code in a product is related to IEC60730 compliance, then 30% of the bugs will be in that code. Given what the code has to do, my money is that the IEC60730 compliance code will have a much higher bug rate than the general application. Thus the probability of a failure occurring in the IEC60730 code is high - and someone will get hurt when the code fails.

As a parting thought, how exactly does one set about testing code that is designed to detect hardware failures internal to an integrated circuit. Although I'm sure I could come up some test protocols for some hardware, I suspect that the Heisenberg uncertainty principle will ensure that the very act of testing the test will result in a flawed test.


Home

Bookmark and Share

Monday, February 04, 2008

The perils of overloading

This post is coming to you from Sweden - a very fine country that I heartily recommend visiting if you get the chance. (If you're wondering why I'm in Sweden - I'm here on business as one of my clients is located in Gothenburg). Anyway, the fact that I'm in Sweden is relevant to this post, as to get here I had to put myself at the mercies of United Airlines. Now the fact that the flight over here was less than perfect wouldn't be news to any of you that travel regularly. However, the reason that the flight was a disaster is relevant, as I'll now try and explain...

Upon arrival at the United check in desk at Dulles airport, I was greeted by an array of self check in kiosks, with a total of one real live human being to take care of baggage check in. Thinking myself to be computer savvy, I negotiated the check in kiosk with ease, only to be told that:
  1. I had to see the human in order to check my bags in, and
  2. The system was unable to assign me a seat and that seat assignment would be done at the gate.
The first instruction was par for the course, while the second instruction I found to be very strange. Anyway, I shrugged my shoulders and went over to the sole person working the desk. There was one gentleman in front of me. This gentleman, not unreasonably asked if he could use some of his frequent flier miles to upgrade to business class. No problem said the United employee, who proceeded to rattle the keys. After 5 minutes, he announced that although the system was showing that seats were available in business class, the computer system refused to allow him to assign a seat. This was the second clue that things were heading south in a hurry. It then took the clerk another 10 minutes to wait list the gentleman (giving a total processing time of 15 minutes). Although it's possible the clerk was incompetent, I got the impression that he really knew what he was doing, and was just being stymied by the system.

Anyway, I checked my bag in and proceeded to the gate. When I got to the gate, I found another 100+ passengers that also had no seat assignments. When eventually I got called to the counter, I found a harried women with a sea of boarding passes printed out in front of her. She was manually searching through them trying to find my name. Eventually she found it and handed it over. My nature being what it is, I politely inquired as to the reason for this astonishingly strange system of assigning seats and issuing boarding passes. Apparently this was the opportunity that the clerk had been waiting for to vent her frustration, as she gladly explained to me that the powers that be had over booked the flight. And so my gentle reader, we come to the point of this post. It was apparent that the United system was unable to handle an overbooked flight correctly, and rather than degrade gracefully, had all but collapsed. At which point I started making some snarky comments to myself about database programmers and how surely all database programmers worked in that field because they couldn't handle the rigors of the embedded / real time world and that any half decent embedded systems person would never make such an elementary mistake. It was then that I had my epiphany. We make the same mistake in the embedded world all the time. When was the last time you used RMA (Rate monotonic analysis) to guarantee that all your tasks would meet their scheduling deadlines? How many failures of embedded systems are caused by overloading (or over scheduling) and the failure to correctly assign task priorities. How many times do weird things happen in your code that you just shrug off as "one of those things"? In short, I found myself cutting a break to the poor sod that wrote United's code. I was still ticked off though!

Home

Bookmark and Share

Sunday, January 27, 2008

A new way to tell if something is an embedded system

Periodically someone tries to come up with a definition of an embedded system. For example there is an excellent and oft cited definition here. What got me thinking about this topic is the latest gadget I love to hate - my Verizon Treo phone running Windows mobile. A few years ago, there would have been no doubt that a cell phone was an embedded system. Today, the Treo, the i-Phone etc are all running versions of traditional computer operating systems, and are much more computer like than they are an embedded system. So the question is what are they - an embedded system or a computer?

Well today I offer a new simple test to tell if these devices are fish or fowl (foul is perhaps more appropriate), to wit:

"Is the device a pain in the neck to use?" If the answer is "yes", then it's a computer. My Treo is a computer. Enough said!

Home

Bookmark and Share

Friday, January 18, 2008

Electronic Component Footprints

As well as writing code and designing hardware, I also do PCB layout. I started doing this after I discovered it was often faster for me to layout a board myself than to try and convey all my requirements to a board layout person. If you've ever done PCB layout, you'll know that getting information about a device's footprint is a real pain. What you may not know is that this is a major source of errors on printed circuit boards, resulting in costly board re-spins and project delays. These errors come about for several reasons.
  1. Getting the information. Many manufacturers include packaging information directly into the parts data sheet. Other manufacturers (TI being a principal offender) instead just cite a packaging part number and say something contrite like "See our website for the latest information". One is then forced into searching a gigantic web site to discover that packaging style WP8 is what the rest of the world calls SO8. I don't mind them decoupling the packaging information from the part data sheet. I just wish they'd get with the program and discover something called Hyper-linking (it's only been around since the 1960s).
  2. Footprints are usually dimensioned as if they were a mechanical part. By this I mean that the drawing is usually rendered like most mechanical parts. Unfortunately, the layout package I use (and I suspect most of the others) treats a footprint as an electrical component. This results in all the pads being on an X-Y grid, with pin 1 usually being at (0,0). What this usually means is that one has to spend time performing a series of elementary trigonometric calculations in order to work out where to place the pads exactly. As you may imagine, this is a major source of error in footprint creation. The frustrating thing for me is that for the mechanical person providing the footprint information, it would be trivial to have their CAD system generate the information in a way that is directly usable.
  3. Many suppliers of mechanical components now offer solid models of their parts on their websites. Typically the models are offered in a number of formats (ProEngineer, Solid Works etc). Thus, if I'm using say a valve from this supplier, I don't have to create the model. I just download it and incorporate it into my working drawing. Why then do suppliers of electronic components not do the same thing for part footprints? I suspect the answer is that no one ever selected a part to use in a design because it made the layout person's job easier.
  4. Lastly, you may be unaware that the footprint for a surface mount part differs depending on whether it is to be reflow soldered or wave-soldered. Some companies (mainly in Europe) supply both footprints. Too many however simply supply the reflow footprint and leave it up to the lowly layout person to try and work out what the footprint should be for wave soldering.
So what's the point of this screed? Well, our industry is all about getting products to market as soon as possible at the lowest possible cost. Component manufacturers could help their customers (which in turn would help them) achieve this goal by simply providing information that removed the footprint bottleneck.

Home

Bookmark and Share

Sunday, January 13, 2008

Omniscient Code Generation

Hi Tech Software has recently been making a lot of noise about its "Omniscient Code Generation". In a nutshell, the technology appears to defer code generation until the entire program has been compiled, and then look at everything before generating the final object code. The end result is a dramatically more compact (and presumably faster running) program image. I haven't had a chance to play with the compiler yet (in part because it's still in beta testing). If they have done what they claim, then Hi Tech should be commended. On my list of things to check out about the technology will be:
  • Is the technology smart enough to track function calls via function pointers? If it is, then this is truly a neat piece of technology. If instead, it's one of the limitations of the product, then its usefulness to me has just plummeted.
  • Does the technology also track function calls from within interrupts? My experience is that interrupt handling is still the poor relation of compiler technology. If Hi Tech does this, then I'll be impressed.
Also of interest to me is how other compiler manufacturers will respond. Keil has performed global register coloring on its 8051 compiler for years. I suspect that the Hi Tech approach is a step beyond this, so there's a chance that Keil will be finally knocked from their #1 position in 8051 code generation. IAR offers a multi unit compilation option with some of its compilers. However, this option isn't integrated into its Embedded Workbench, so it's practically useless. With Hi Tech offering compilers for ARM, PIC & MSP430 I can see this really creating a burst of competition in the industry. Excellent!

Home

Bookmark and Share

Wednesday, August 29, 2007

An unfortunate consequence of a 32-bit world

Back in the bad old days when I was a lad, one learned about microprocessors by programming 8 bit devices in assembly language. In fact I can still remember my first lab assignment - namely to multiply two 8 bit unsigned quantities together to get a 16 bit result (without the use of a hardware multiplier of course). One of the indelible lessons that comes from doing an exercise such as this, is that it can take many instructions to perform even the most innocuous of high level language statements.

I mention this, because today I was looking at some code written by a young engineer who was recommended to me. In examining some of his code, I noticed the following construct:

int ivar;

void some_function(void)
{
...
++ivar;
...
}

interrupt void isr_handler(void)
{
...
--ivar;
...
}


Notwithstanding the fact that ivar should have been declared volatile, the most egregious mistake here was the assumption that the statement ++ivar is an atomic operation. Now if one is used to working on 32 bit machines, the concept of incrementing an integer being anything other than an atomic operation is of course ludicrous. However, in the 8 or 16 bit world where many of us labor in the embedded space, the idea of incrementing an integer being an atomic operation is equally ridiculous. The trouble is with bugs like this is that they are difficult to spot, and will only rear their head after months or even years of operation.

So, is this a case of an incompetent individual? Although nominally yes, I suspect that the real problem is that he was raised on a diet of big CPUs. Perhaps the universities could do these engineers a favor, and throw away the ARM based evaluation boards and replace them with an 8051 based system.

Bookmark and Share

Thursday, August 02, 2007

Application notes code quality

All manufacturers of microcontrollers publish application notes. Some of these application notes are of course nothing more than gussied up advertising drivel. However, many of these application notes contain useful information that can cut days, and sometimes weeks off a project.
Having read hundreds of these application notes over the 25 years I've been doing this, I've come to the conclusion that whereas the application notes usually get the algorithms correct, the same can't be said for the code. Too often the code is sloppy, with bugs that are apparent merely by code inspection. May be it's just me, but whenever I see a sloppy piece of code, it makes me wonder about the underlying quality of the IC design.

I think this is unfortunate, since the manufacturer's could do much to improve things in the industry by setting a great example. To this end, I think they should:
  1. Adopt a set of coding standards that all their code adheres to.
  2. Have the code reviewed, such that egregious bugs are caught.
  3. Make the code Lint free
  4. If they are aiming the product at the automotive industry, ensure it is MISRA C compliant.
The advantages to the IC manufacturer are legion:
  1. They look good (never a bad thing)
  2. All their application note code has the same "look and feel". This encourages engineers to use their application notes, and hence their products.
  3. The code in the application note is usable "as is", speeding time to market and generally giving the perception that their product is easy to use.
  4. Less experienced engineers are taught how to do things correctly - which presumably leads to higher quality products- which presumably translates into more sales.
I guess the thing that I find maddening about this, is that the manufacturers probably spend weeks or months developing the application note, and then let themselves down by presenting their solution in such a poor way. When I talk to the marketing folks for the CPU manufacturers, I make a point of bringing some of the more egregious errors to their attention. Perhaps if all of us did this, we could get a bit of a sea change in the industry.

Home

Bookmark and Share

Friday, July 13, 2007

Comments on code comments

People's opinion on code commenting is a bit like their opinion on speeding (you know the adage - anyone that drives faster than you is a maniac, anyone that drives slower than you is a doddering old fool). With this in mind, I recently got into a bit of a disagreement with a faculty member of one of America's finer engineering schools. Here's a summary of our positions.

Me
I've looked at this 750 SLOC file. It contains no header, no comments, or any other explanation as to what it does. The code itself is non-trivial, involving a large amount of recursion, dynamic memory allocation etc and thus what the code does and how it does it, and indeed why it exists is not obvious to me.

Faculty
Based upon the file name it should be obvious what the code does. If you don't understand the theory of this entity, then you have no business looking at the code. P.s. the code is documented


Home

Bookmark and Share

Wednesday, June 13, 2007

Size matters

Periodically I get printed propaganda from the semiconductor manufacturers touting their latest and greatest ICs. Evidently the marketing folks are convinced that size matters because the size of the IC is almost the first thing they tell you now. A recent example from Maxim has the headline: "Smallest, Most Efficient and Flexible Notebook Fuel-Gauging Solution".

Well size does matter. However, it seems to me that the industry has gone too far. More and more devices are being offered only in chip scale packaging (CSP). As a result, it is all but impossible to hand build a prototype, let alone cobble together a breadboard. The result of this is that in many cases it simply doesn't make economic sense to use the part simply because CSP requires the prototype board to be machine built at a cost of thousands of dollars.

I think the manufacturers are aware of this problem and are trying to address it by offering evaluation boards. While these are OK for the breadboarding phase, they don't solve the prototyping problem. Furthermore even if the project can justify the cost of machine built prototypes, probing the part or (heaven forbid) making modifications to the board is virtually impossible. The bottom line IC manufacturers. Offer all your parts in a package that can be handled by people. Please.

Home

Bookmark and Share

Monday, June 04, 2007

Understanding Stack Overflow

I suspect that many, if not all bloggers are somewhat narcissistic. In my case it shows through in that I use one of the free services that keeps track of how many visitors I get and what brought them to this blog. Well, it turns out that many of the visitors to this blog get here not because of the brilliance of my writing, but because they did a Google search on "stack overflow" often qualified by PIC, or MSP430 etc. For many of these visitors I suspect they leave empty handed. Thus in an attempt to make these visits less pointless, let me give you my take on what causes a stack overflow in an embedded system.

First of all, go read the Wikipedia description of stack overflow. There's nothing wrong with the description - it's just incomplete from an embedded systems perspective.

If you are having problems with 8 bit PICs, then you should read this. For other architectures, read on...

On the assumption that you are getting a stack overflow and that you aren't performing recursion or attempting to allocate a large amount of storage on the stack, what can be going wrong? Here's a check list.
  1. What's your stack size set to? If you don't understand the question then you need an introductory course to embedded systems programming. If you do understand the question - but don't know the answer - then this is the most likely source of your problem. How can this be you ask? Well, most embedded systems compilers are designed to work with a particular family of processors. The low end of the family may have a tiny amount of memory (e.g. 128 bytes). As such setting the default stack size to 16 bytes may be a sensible thing to do. Thus, your first step is to ensure that the stack size is set to something reasonable for your system. Click here for advice on how to do this.
  2. Which stack is overflowing? Many processors / compilers support / implement multiple stacks. A typical dichotomy is a call stack (upon which the return addresses of functions are stored) and a data or parameter stack (upon which automatic variables are stored). If you are using an RTOS, then typically there will be a shared call stack while each thread will have its own data stack. Thus is it the shared call stack that is overflowing, or is it the parameter stack associated with a particular task? Once you've made the determination which stack is overflowing then finding out exactly what gets placed on that stack will help lead you to the solution to your problem. If you can see no obvious high level language construct that is causing the problem, then the single most likely cause of your misery is an interrupt service routine...
  3. An interrupt service routine can use up an extraordinary amount of space on the stack. For a discussion of how this arises and its impact on performance, see this article. This problem is compounded if your system allows interrupts to be nested (that is, it allows an ISR to itself be interrupted).
  4. Certain library functions (printf() and its brethren are prime offenders) can use an enormous amount of stack space.
  5. If you are writing partially in assembly language, are you failing to pop every register that you pushed? This often occurs if you have more than one exit point from a function or ISR.
  6. If you are writing entirely in assembly language, did you set up the stack pointer correctly and do you know which way the stack grows?
  7. Have you made the mistake of programming a microcontroller that you don't understand? For example, low end PIC processors have a tiny call stack which is easily overflowed. If you are programming a PIC and don't know about this limitation, then quite frankly, I'm not surprised you are having problems.
  8. If none of the above solve your problem, then I'm afraid you are most likely in to a stack over-write problem. That is, a pointer is being de-referenced that results in the stack being overwritten. This can often arise when you allocate an array on the stack and then access an element beyond the end of the array. Lint will find a lot of these problems for you. If you don't know what Lint is, see this article. If you do know what Lint is and aren't using it then you deserve to be faced with these sorts of problems.

I have also written a related article on setting your stack size that you may find useful.

Home

Bookmark and Share

Saturday, May 19, 2007

Continued Fractions

Once in a while something happens that makes me realize that techniques that I routinely use are simply not widely known in the embedded world. I had such an epiphany recently concerning continued fractions. If you don't know what these are, then check out this link.

As entertaining as the link is, let me cut to the chase as to why you need to know this technique. In a nutshell, in the embedded world we often need to perform fixed point arithmetic for cost / performance reasons. Although this is not a problem in many cases, what happens when you need to multiply something by say 1.2764? The naive way to do this might be:

uint16_t scale(uint8_t x)
{
uint16_t y;

y = (x * 12764) / 10000;

return y;
}

As written, this will fail because of numeric overflow in the expression (x * 12764). Thus it's necessary to throw in some very expensive casts. E.g.

uint16_t scale(uint8_t x)
{
uint16_t y;

y = ((uint32_t)x * 12764) / 10000;

return y;
}

Our speedy integer arithmetic isn't looking so good now is it?

What we really want to do is to use a fraction (a/b) that is a close approximation to 1.2764 - but (in this case) has a numerator that doesn't exceed 255 (so that we can do the calculation in 16 bit arithmetic).

Enter continued fractions. One of the many uses for this technique is finding fractions (a/b) that are approximations to real numbers. In this case using the calculator here, we get the following results:

Convergents:
1: 1/1 = 1
3: 4/3 = 1.3333333333333333
1: 5/4 = 1.25
1: 9/7 = 1.2857142857142858
1: 14/11 = 1.2727272727272727
1: 23/18 = 1.2777777777777777
1: 37/29 = 1.2758620689655173
1: 60/47 = 1.2765957446808511
1: 97/76 = 1.2763157894736843
1: 157/123 = 1.2764227642276422
2: 411/322 = 1.2763975155279503
3: 1390/1089 = 1.2764003673094582
1: 1801/1411 = 1.2763997165131113
1: 3191/2500 = 1.2764


We get higher accuracy as we go down the list. In this case, I chose the approximation (157 / 123) because it's the highest accuracy fraction that has a numerator less than 255. Thus my code now becomes:

uint16_t scale(uint8_t x)
{
uint16_t y;

y = ((uint16_t)x * 157) / 123;

return y;
}

The error is less than 0.002% - but the calculation speed is dramatically improved because I don't need to resort to 32 bit arithmetic. [On an ATmega88 processor, calling scale() for every value from 0-255 took 148,677 cycles for the naive approach and 53,300 cycles for the continued fraction approach.]

Incidentally, you might be wondering if there are other fractions that give better results than the ones generated by this technique. The mathematicians tell us no.

So there you have it. A nifty technique that once you know about it will make you wonder how you got along without it for all these years.



Home

Bookmark and Share

Tuesday, May 01, 2007

H1-b visas and Economics 101

USA Today has a story today about how 123,000 applications were received within 48 hours of this years H1-b visa lottery being opened on April 1. Given that there are 65,000 visas granted a year, there seems to be a large mismatch between supply and demand. Although the USA Today story talks about some of the sexy positions (Supermodels! Complete with alluring photograph!), the reality is that most of these applications are for the fields of electronics and computing, including embedded systems.

This topic interests me, in part because I came to the USA on a similar visa program (actually an E2 - but that's another story).

Anyway, whenever this topic comes up, there's normally some quote from a high tech industry executive explaining that they simply can't get enough talented folks - and hence the need for the program. Whenever, I see this argument advanced, I'm always struck by the failure of the journalist to ask a basic question - namely "What would you do if the program was eliminated?" I suspect that the honest executive would answer:
  1. Lobby like mad to get it reinstated
  2. Pay what I had to to get the talent I needed
  3. Look to put the work where the talent is (i.e. ship it overseas).
Whereas I could probably discourse for a long time on answer 1, it's the other two that intrigue me.

The reality today is that enrollment in engineering is dropping. If one was to look at non first / second generation immigrant enrollment, I'd hazard a guess that it has all but collapsed. This is despite the fact that engineering in general (and electrical engineering in particular) is always one of the highest paying jobs upon graduation, with recent graduates earning about $65K, versus the $30K earned by your typical liberal arts major. So, what would happen if these salaries doubled? Would this be enough to attract more home grown talent in to the industry? Economics 101 would suggest that if you raise the salaries high enough then supply will rise to meet the demand. The question is, by how much would salaries have to rise?

Economics 101 also suggests that as the price of a good / service rises, it is highly likely that the consumer will look for a substitute. At present this works by bringing folks in on the H1-b program. If the program was eliminated, then I assume that this would be done by shipping more work overseas.

I guess this leads me to the point of my post. The USA prides itself on its capitalist approach - and the belief that the free market is inherently the best way to solve all (OK, most) problems. As a result, Americans normally abhor government interference in the market place. But isn't that exactly what is being done here?

If we genuinely believe in the free market, then the H1-b visa program should be abolished. Salaries would rise for engineers, more students would study engineering - and more work would go overseas. I have no idea whether the end result would be beneficial to engineers or not. It would however be ideologically consistent.

The economic purists might argue that the H1-b visa should be scrapped in the sense that anyone who wished to work here should be allowed to do so. I agree that this is also ideologically consistent. However, the reality is that the USA limits immigration in all fields. Thus to be truly consistent this would require the USA to do the same for all jobs - which is tantamount to saying there are no limits on immigration - something which isn't going to happen.

Home

Bookmark and Share

Saturday, April 21, 2007

Crest factor, Square roots & neat algorithms

I've been programming microcontrollers for about 25 years now - and can count on one hand the number of times I've needed to compute the square root of an integer. This curious drought came to an end recently when I needed to compute the Crest Factor of the line voltage being used to power a product I was designing. (For the uninitiated / rusty out there, Crest Factor is the ratio of the Peak : RMS of a waveform. For example, A sine wave has a CF of 1.414, whereas a square wave has a CF of 1.000).

Why, you might ask, do I need to compute the CF? Well, the product uses triacs to control a number of AC loads. If the system is inadvertently powered from a square wave inverter, or just a really lousy generator, then the triacs will not self-commutate - and I could never turn off the loads. Thus to prevent this unfortunate scenario, I need to know how good (i.e. sinusoidal) the line voltage is. The CF is a direct figure of merit that allows me to make this decision.

Evidently, the computation of CF requires one to compute an RMS voltage, which in turn requires one to calculate the square root of a number. For various reasons, I need to compute the CF on a mains cycle by cycle basis - and I'm using a 7.37 MHz ATmega CPU. Thus, the computational efficiency of the algorithm is important.

Now IAR has a nifty little algorithm that computes an approximate square root. See http://supp.iar.com/Support/?note=18180&from=search+result

However, this gets blown away by the algorithm described by Crenshaw in his wonderful book: Math Toolkit for Real-Time Programming, CMP Books. ISBN 1-929629-09-5.

The code in his book is for computing the square root of a 32 bit unsigned integer. I adapted it to give the square root of a 16 bit integer. Here's the code:

static inline uint8_t friden_sqrt16(uint16_t val)
{
uint16_t rem = 0;
uint16_t root = 0;
uint8_t i;

for(i = 0; i < 8; i++)
{
root <<= 1;
rem = ((rem << 2) + (val >> 14));
val <<= 2;
root++;
if (root <= rem)
{
rem -=root;
root++;
}
else
{
root--;
}
}
return (uint8_t)(root >> 1);
}


This will compute the exact square root of a 16 bit integer in about 268 clock cycles on an AVR - i.e. in about 33 microseconds on an 8 MHz AVR processor.

To Crenshaw's point - don't just blindly use the code, but endeavor to understand how it works. Only then will you see it for what it truly is - a work of art. Thanks Jack.

Home

Bookmark and Share

Saturday, March 31, 2007

Tool Upgrades

As a consultant that does hardware , firmware & software work for my clients, I use a large array of software tools - half a dozen compilers, schematic capture and PCB layout tools, analysis tools as well as the usual gaggle of productivity tools that non-engineers also use. Throw in the tools for running a business and my PC is a regular treasure trove of applications.

With all these tools, the number of upgrades / updates is starting to get out of hand. Every week, it seems I'm updating a major application. The most common scenario seems to be:
  1. I haven't used a tool in a month or so.
  2. I invoke it - and it tells me that an update is available. Often the mandate is 'mandatory' or at least 'recommended'.
  3. I accept the update.
  4. The download proceeds. Some of them are simply enormous (Ever downloaded the Xilinx Webpack IDE?)
  5. The patch then proceeds. The time to execute the patch is often considerable.
  6. Finally - the dreaded 'You must restart your computer' directive. I've a dozen applications open, web pages marked, manuals at strategic places - and now I have to close them all down.
Having gone through all this rigmarole, I can finally start using the tool. Of course by now, I just want to 'get on with it', and so the release notes often get cursory attention. Inevitably, if I do read the release notes then I find the upgrade is completely useless to me (e.g. support for a new device that I'm not using). If I don't read the release notes then of course there's this really neat feature that's been added that really makes life easier - and I don't find out about it until weeks later.

Well - enough complaining. Do I have any suggestions? I think so. I'd like tool vendors to realize that their tool isn't the only one in the box - and that many of us use it on a less than daily basis. With this perspective, I'd like the tool vendors to do the following:
  1. Download upgrades in the background. A lot of applications already do this - they all should.
  2. Inform me there is an update available when I close the tool rather than open it. That way I can allow the update to occur while I'm off doing productive work elsewhere.
  3. Do everything you can to avoid requiring the user to re-boot their computer.
  4. Limit updates to one or two a year. I know product managers want folks on support contracts to feel they are getting their money's worth - but this only works if my life revolves around that tool - and it doesn't!


Home

Bookmark and Share

Thursday, December 14, 2006

Wanted - a new performance metric

In the bad old days, the two major performance concerns in CPU selection were whether a CPU had enough processing power and memory to get the job done. Although these are still issues, it's a rare problem that requires more bandwidth and memory than can be provided by the CPU vendors.

By contrast, today, well over half of the systems I work on are battery powered, and so I find the major question I have when designing an embedded system is 'how long will the battery last?' If you can work this out from studying the data sheets of the various CPU vendors then you're a better engineer than me.

Thus to solve this problem, I propose that we introduce a new performance metric - namely how much energy (Joules) does it take to perform a set of standard tasks. Rather than the usual bunch of quasi meaningful benchmarks, I'd like to see benchmarks such as:

  1. How much energy does it take to receive and transmit one thousand characters through an asynchronous serial port running at 38400 baud?
  2. How much energy does it take to perform a task switch using a standard RTOS such as uCOS-II?
  3. How much energy does it take to perform one thousand A2D conversions?
  4. How much energy does it take to execute a 64 tap FIR filter?

With metrics such as these, the task of choosing the best CPU (and compiler for that matter) would be made much easier. I'm quite prepared to let off the hook those vendors that aren't selling CPUs aimed at the portable market. For the other guys (TI, Atmel, ARM etc) it's time to step up to the wattmeter and be measured.

Home

Bookmark and Share

Friday, December 08, 2006

Wanted - .TEC password

It's time for my first rant - you have been warned!

I recently bought a new computer, complete with a gorgeous 24" flat panel display. The flat panel supports a speaker bar - which I also bought. The installation instructions for the speaker bar are quite straightforward - align the tabs on the bar with the holes in the display, and push until the bar clicks in to place.

Well, on my system, there's no click. The display seems to lack the spring loaded latch necessary for this to work.

I have now had four email exchanges with 'technical support'. The first didn't read what I wrote, the second told me that this was a big issue and would take several days to resolve, the third did a keyword search on 'speaker bar' and sent me a bunch of useless links, and the fourth decided that my problem was that I didn't understand the installation instructions - and so sent me another copy of them.

In short, I've been treated like a moron.

I suspect that some / many / most people that contact technical support lack, ahem, technical acumen. Well, if you are reading this blog, the chances are you are not such a person. I also suspect that you've had a similar experience - which got me to thinking. What I need is a .TEC password. Just as Microsoft's .NET password lets you manage your net identities, a .TEC password would tell the recipient that they are dealing with someone who really can, at the very least, align two tabs with their mating holes and push - and so should be treated accordingly.

Thanks for listening.

Home

Bookmark and Share

Monday, December 04, 2006

RIP VOIP

As someone that has worked in telecomms, I was excited by the arrival of VOIP. However, after two years of variable quality, extended outages and just plain weird behaviour I've had it. It's clear to me that VOIP just isn't ready for prime time and so I have decided to pull the plug. The latest frustration - an inability to receive incoming calls for the last four days - with no resolution in sight. The technical support department informs me that it's a 'router programming error'. Whether they really mean a router configuration error, or a bug in the router firmware is unclear. Regardless, it's presumably a tough enough problem that it can't be fixed in four days.

The really bad news here is my experience when I tried to get Verizon to provide me with a POTS line. One of my prime reasons for jumping on VOIP as soon as I could was my feeling that Verizon was a dreadful company - one with questionable ethics and really awful customer service. Today, despite calling the number on the Verizon website for 'add a new line', I had to endure a voice prompted menu system and three different people before I could do the most mundane thing Verizon has to offer - order telephone service. For this privilege, Verizon is charging me a $44 start up fee (to plug a few numbers into a computer) and a cost double that offered by my VOIP provider. Apparently Verizon has not had its business suffer enough - yet.

So what's the relevance of this tail of woe to embedded systems? Not much really, other than to note that when the latest and greatest doesn't live up to its billing - one ends up with very annoyed customers. So next time marketing wants to over-hype what you can deliver, rein them in hard and fast. Your customers will thank you.

Home

Bookmark and Share

Friday, November 24, 2006

Help! My third party source code doesn't comply with my coding standards

Two big trends in the embedded world are on a collision course - and the resolution isn't going to be easy. The two trends are the requirements that all code meet internal coding standards and the use of third party code.

Organizations have been gradually been getting religion about having and enforcing coding standards. As well as spelling out what the source code should look like, and making rules for what is kosher, many internal standards now also require code to be 'Lint free', and also possibly that it conform to various standards, such as those laid down by MISRA.

Simultaneously, organizations have been striving to improve productivity. One way of doing this is to turn to code re-use. Code re-use is normally discussed in the context of code that you've already developed being re-used in subsequent projects. However, a far more powerful paradigm is to use code that others have developed. Need a CRC algorithm, or a way of computing a MD5 hash - head to the Internet to find your source code. Have a need to develop a complex state handler - hello visualSTATE. Need to develop a GUI - take your pick from a plethora of component suppliers. Now if you were developing for a PC, most of this code would be supplied in binary format. However with the plethora of embedded targets and compilers, the chances are you'll get source code that you'll need to compile.

Now, the chances of the source code matching your coding standards should be nil. So what is to be done? My experiences to date have been pragmatic - but not pretty.

For small pieces of code, I simply rewrite them to bring them up to standard.
For third party libraries, such as a graphics library, it is usually impratical, if not illegal, to modify the source code, and so one is forced to accept the code as is.
For machine generated code, even if it's small, rewriting the code is pointless, since the chances are you'll be regenerating it later and over-writing your work. Thus, once again, one is forced to accept the code as is.

So what is to be done? At present, my coding standards procedure allows one to issue a variance where code doesn't comply (in pretty much the same way that MISRA allows variances to be issued). Although this is OK, let's recognize it for what it is - a cop out. What we really need are the suppliers of source code to recognize and adhere to various 'standards'. For example:

1. Use the C99 data types folks. I'm tired of seeing UINT8 definitions everywhere when ISO has stipulated that a uint8_t data type is an 8 bit unsigned type.
2. Make your code Lint free. If you're selling source code, it's in your interest to make it as clean as possible. PC-Lint from Gimpel is the gold standard, so make sure you can pass it with a clean bill of health (and I don't mean by suppressing every complaint it has).
3. Make your code MISRA compliant. MISRA can be a pain - but their intentions are good. If nothing else, making your code MISRA compliant will increase the size of your target market. This issue has been recognized by IAR to whom I'd like to congratulate for making the code generated by the upcoming new release of visualSTATE MISRA compliant.

What if you are just an honest Joe, just putting code out there for all to use and enjoy? Well why not adhere to the same rules? It'll make your code more useful - and after all isn't that the point of publishing it in the first place?

Home

Bookmark and Share

Friday, November 03, 2006

Unexpected uses and the consequences thereof

I'll pose today's blog in the form of one of those lateral thinking questions - which you may want to try and solve before moving on to the rest of the post.

An engineer walks into a meeting, unpacks his laptop and an Ethernet hub, powers both up and then connects an Ethernet cable between the laptop and the hub. No other connections are made to the hub. Explain.

Well I suppose two obvious answers are that the engineer is nuts (likely), or that the engineer doesn't understand the basics of Ethernet technology (less likely). Of course, in this case, the engineer is me, and while I can't really attest to my mental state, I do know a thing or two about Ethernet. So what is causing this strange behaviour?

Well, like many engineers, I use some very expensive software. The vendors of this software, in an effort to protect their product from unpaid copying, lock the software to the computer's NIC. (For the uninitiated, every Ethernet interface IC on the planet has a unique MAC address. Thus any computer with a NIC has a built in unique identifier). Now the vendor of my laptop (Toshiba), in a sensible effort to conserve power, powers down the NIC when it detects no valid signal on the Ethernet port. When the NIC is powered down, it can't respond to requests for its MAC address, and so the copy protection scheme complains and I can't run my expensive software.


Who is to blame here? I can't really fault the SW vendor for wanting to protect their investment, and I can't blame Toshiba for wanting to minimize the power consumption of their product. I suppose it would be nice if Toshiba provided a utility to prevent the auto power down - but that's probably inconsistent with them trying to make the system easy to use for the average consumer. I think the answer is that the fault lies with us in the engineering community. We value great tools, but apparently enough of us (and our employers) are dishonest enough that we'll copy them if we get the chance. Apparently part of the price we pay for this is looking like idiots when we walk into meetings...

Home

Bookmark and Share

Wednesday, October 11, 2006

Knowledge versus Understanding

Every month or two a 'Technical Recruiter' from one of the larger placement companies calls me up to see if I'm available for work. Most of the time I'm not, and so the conversation terminates quite quickly. However, once in a while I am available, and so the inevitable request for an updated resume is made. After sending an updated resume, the 'Technical Recruiter' calls back to discuss what you have to offer.

Well, for the first time in several years I recently went through this rigmarole. The conversation with the recruiter was both illuminating and yet rather depressing. To paraphrase, the conversation went like this:

Recruiter: "What RTOS experience do you have?"

Me: "VxWorks, MicroC/OSII, Embedded Linux, various bespoke systems"

Recruiter: "No others? "

Me: "Isn't it more important that someone understands the benefits and limitations of an RTOS rather than knowing the particular API of a specific RTOS?"

Recruiter: After a long pause. "Our clients like someone that can hit the ground running."

I see two possibilities here.
1. The 'technical recruiter' has no technical knowledge and is nothing more than a matcher of acronyms and buzz words.
2. His clients really are saying to him, we need someone with experience of XYZ RTOS.

If it's the latter, then it appears that knowledge is a more highly prized commodity than understanding. Personally, given the choice between someone that knows an RTOS API and someone that really understands priority inversion, can discuss the pros and cons of RMA as a scheduling algorithm, and can explain the implications of making an RTOS call from within an ISR, then I'd take the latter any day. Of course, one might claim that an experienced user of XYZ RTOS should be aware of these sorts of issues. However, in my experience, large swathes of the folks out there using an RTOS really don't have a clue about what it's doing for them - and what it's costing them.

Thus my point is this. Next time you are looking for help, think about what you'd like the person to understand - as well as what they should know. I suspect you'll end up with better help.


Home

Bookmark and Share

Tuesday, October 03, 2006

Fuse Blown

As a professional developer of embedded systems, I use a lot of sophisticated tools and best practices in order to create the best embedded applications I can. Today, however, I ran into an issue which makes a mockery of much of what I (and presumably you) do.

So what is this issue you ask? Well, in case you are unaware, two very popular microprocessor families (PIC & AVR & probably others) require one to configure multiple parameters in their microcontrollers via fuse bits. These parameters typically cover critical hardware parameters, such as oscillator type and frequency, brown out settings, code protection bits and so on. These fuse settings are NOT programmable from within the application and hence are typically outside the programmer's direct control. Thus a solution based upon devices in these families consists of both the programming image (i.e. the binary representation of your code) and also the fuse bit settings.

Now, this wouldn't be too bad if there was some way to combine both sets of information in to one master programming file. In fact both Microchip and Atmel allow one to do this within their IDE's. However, what happens when one needs to have the microprocessors programmed on a high speed production gang programmer? Well, I found out today - and it isn't pretty.

The procedure is to supply an Intel Hex record file for the application, and to provide the programming house with an email detailing the required fuse settings! So, after using all the sophisticated tools at my disposal to craft a working embedded system, I ultimately have to rely upon the manual transcription of configuration bits in to a programmer to ensure that the end product is actually programmed the way I need it to be.

This is patently absurd! We need an industry standard programming file that allows both the program image and the configuration bits to be defined, independent of the manufacturer, so that we can be confident that devices are programmed the way we want them to be. (Incidentally, checking a first article device is only of limited benefit, since in many cases, we want to set a lock bit that prevents anyone (including oneself) from reading anything about the device).

Does anyone out there have any ideas on how we can get this problem solved?

Home

Bookmark and Share

Tuesday, September 26, 2006

Legal Limbo

I don't intend to write much about political issues, since it isn't the purpose of this blog. However, when something arises that affects a lot of us in the high tech industry, then I'll make an exception. The case in point today is the proposed legislation in the US Senate that attempts to codify the administration's position on detaining terrorism suspects, interrogating them etc. I'm no lawyer, so I have to rely upon the expertise of others in understanding what the legislation is all about. In this case, I'm relying upon the testimony of Senator Patrick Leahy.

At this point, you're probably wondering what this has to do with embedded systems. Well, have a read of this excerpt from Senator Leahy (The full text appears here). I'll then make my point.

Today we are belatedly addressing the single most consequential provision of this much-discussed bill, a provision that can be found buried on page 81 of the proposed bill. This provision would perpetuate the indefinite detention of hundreds of individuals against whom the government has brought no charges and presented no evidence, without any recourse to justice whatsoever. That is un-American, and it is contrary to American interests.

Going forward, the bill departs even more radically from our most fundamental values. It would permit the president to detain indefinitely—even for life—any alien, whether in the United States or abroad, whether a foreign resident or a lawful permanent resident, without any meaningful opportunity for the alien to challenge his detention. The administration would not even need to assert, much less prove, that the alien was an enemy combatant; it would suffice that the alien was "awaiting [a] determination" on that issue. In other words, the bill would tell the millions of legal immigrants living in America, participating in American families, working for American businesses, and paying American taxes, that our government may at any minute pick them up and detain them indefinitely without charge, and without any access to the courts or even to military tribunals, unless and until the government determines that they are not enemy combatants. [Emphasis mine]

I'm a legal resident alien in the USA. Huge numbers of the people I know in the embedded systems field are also non-citizens of one form or another. I find this very disturbing and I suspect they will to. Now imagine if you were a talented person from overseas who was considering moving to the USA in search of a better life. Maybe it's just me, but I suspect that a lot of those folks would eschew the USA and opt for other pastures. If that happens, then the life blood of the high tech industry in general (and embedded systems in particular) will dry up.

Am I being overly dramatic here? Quite possibly. However, if you were, for instance, a muslim contemplating a move to the USA, what would you do if this indeed becomes the law of the land?

Home

Bookmark and Share

Saturday, September 23, 2006

Reset Reason

The title of this post is rather ambiguous and can be read several different ways. This is no accident as it reflects the ambiguity that I see concerning the most fundamental event in an embedded system's life - reset. Being a consultant, I get to write a lot of my own code. I also get to read a lot of other people's code and the one area where I almost never see much thought given is to handling the various causes of a system reset. In the bad old days, you were reset and that was all you knew. Today, however, modern processors contain registers that may be interrogated to determine the cause of the last reset. For example, an AVR processor I am working with lists the following possible causes:
Power Up
Brown Out
External Reset
WatchDog
JTAG


Based on my experience, I'd say that 99% of the embedded systems out there don't care what caused their last reset. This strikes me as foolhardy. At the very least, an embedded system should keep track of the number of times it has taken a watchdog reset for post deployment quality analysis (you do do this don't you?). Furthermore, a portable system should take remedial action if it underwent a brown out reset - presumably indicating that the battery is failing. As for a JTAG reset, could this be construed as an attempt by someone to determine the inner workings of your system - and if so what should you do about it?

I have been involved in systems where support for handling the different interrupt sources has been added as an afterthought - and it shows. As a result, I've come to the conclusion that the only way to handle this is to think about it from the start, and to know up front what needs to be done for each of the different reset sources. If you go through this exercise, you'll find that your startup code becomes a lot more sophisticated. You'll also find that you've designed a better system - which after all is the point.

Home

Bookmark and Share

Friday, September 22, 2006

Datasheet Errors

As someone that also designs hardware for a living, I spend a lot of time reading data sheets. Recently I have had to deal with a rash of data sheet errors. I'm not talking about minor errors. I'm talking about colossal, fundamental errors, whereby the device simply does not do what the datasheet describes. How can this be I ask myself? I think there are two possibilities:

1. I have to wonder whether this is one of the consequences of IC design being moved off-shore such that the designers of the IC are non English speakers, and consequently are in no position to proof-read the data sheet.

2. With new ICs being introduced at a phenomenal rate, is this simply the case that so much information is being generated that these sorts of errors are to be expected hence forth?

I'm inclined to think that option 2 is the more likely. If this is the case, why aren't the engineers that are designing these parts insisting upon reading the datasheet before it's published? The next thing you know, the software industry will expect its customers to find their bugs for them...

Home

Bookmark and Share

Five Nines Reliability or POTS versus VOIP

My ISP has just recovered from a system wide crash that resulted in a 20 hour outage. Notwithstanding that it resulted in loss of email etc, it also resulted in the loss of my VOIP service. After decades of POTS , the telephone companies have rung out all the bugs and we have a service that just works. By contrast, the internet (and by extension VOIP) is still in it's infancy, such that the availability is simply pitiful by POTS standards.

I'd love to know what percentage of internet outages are caused by software failures - and I'd like those developers to have to pay me for every minute my system is down!

Home

Bookmark and Share

Tuesday, September 19, 2006

Encrypted email and NDAs

Being a consultant, I do business with a lot of different companies - nearly all of which require a Non-Disclosure Agreement (NDA) to be executed. Most of these NDA's require me to protect the company's intellectual property as if it was my own. So far so good. Once the NDA has been executed however, I'm continually amazed at how often I get sent schematics, source code, technical documents, projects plans etc as attachments to unencrypted email. I send out my digital signature (public key) on all my emails, so it's a trivial step for people to send me encrypted mail. It makes me wonder how many trade secrets are being lost every year simply because the default is to send out email as plain text. Shouldn't your company insist that all email be encrypted and that all external vendors provide them with public keys before any sensitive communication take place?

Home

Bookmark and Share