Tuesday, November 04, 2008

Dogging your watchdog

Most embedded systems employ watchdog timers. It's not my intention today to talk about why to use watchdog timers, or indeed how to use them. Rather I assume you know the answers to these questions. Instead, I'll pass on some tips for how to track down those unexpected watchdog resets that can occur during the development process.

To help find these problems, it is essential to find out where the watchdog reset is occurring. Unfortunately, this isn't easy, since by definition a watchdog reset will reset the processor, typically destroying all state information that could be used to debug the problem. To get around this problem, here are a few things you can try.
  1. Place a break point on the (watchdog) reset vector. Although this will typically not stop the processor from being reset, it will ensure that none of your variables get initialized by your start up code. As a result, you should be able to use your debugger to examine these variables - which may give you an insight into what is going wrong.
  2. Certain processor architectures allow the action of the watchdog timer to be changed between a classic watchdog (when the timer times out, the processor is reset), to a special form of timer, complete with its own interrupt vector. Although I rarely use this mode of operation in release code, it is very useful for debugging. Simply reconfigure the watchdog to generate an interrupt upon timeout, and place a break point in the watchdog's ISR. Then when the watchdog times out, your debugger will stop at the break point. It's then just a simple matter of stepping out of the ISR to return to the exact point in your code where the watchdog timeout occurred.
  3. If neither of the above methods are available to you, and you are genuinely clueless as to where to start looking, then a painful but workable solution is to 'instrument' entry into each function. This essentially consists of some code that is placed at the start of every function. The code's job is to record the ID of the function into some form of storage that will not be affected by a watchdog reset, such that you can identify the offending function after a watchdog reset has occurred. This isn't quite as bad as it sounds, provided you are good with macros, a scripting language such as Perl and are aware of common compiler vendor extensions such as the macro __FUNCTION__. Of course if you are that good the chances are you won't be clueless as to why you are taking a watchdog reset!
I'll leave it to another post to talk about the sort of code that often causes watchdog timeouts.

Wednesday, October 08, 2008

Bug cluster phenomenon

I was debugging a piece of code recently when I realized that there was a scenario, albeit unlikely, in which a divide by zero could occur. Rather than just fix the bug and move on, I invoked what I call the "bug cluster phenomenon" rule. What you may ask is this rule? Well it has two variants. The first is as follows:

"Where there is one bug, there is usually another". I've observed this phenomenon over many years. What seems to happen is that when I (or anyone else for that matter) is generating a block of code, I get interrupted, or I'm tired or my focus is elsewhere. As a result, when I create one bug, I usually create several others while I am at it. Thus when I find a bug in a function, I always assume that it has company near bye. In short, finding a bug in a function always triggers a top to bottom review of that function and its neighbors. This has dramatically reduced my debugging time over the years - and I strongly recommend you adopt it.

The second variant of the rule is as follows:

"Logical errors normally have company". I've also observed this phenomenon over many years. In this case, it seems that if you have made a particular error in logic in one place in the code, the chances are you have made the same error elsewhere. In the case of the divide by zero issue mentioned in the introduction, this prompted me to wonder if I had any other possible divide by zero errors lurking in my code. As a result, I performed a search through the entire project - and sure enough I found a few other cases where there existed the possibility of a divide by zero error. Thus finding one bug caused me to fix several. That's efficient debugging!

Incidentally, I was able to quickly find all the divisions in my code because I am absolutely anal about having a space on either side of an operator. Thus, I needed to search for only two strings - " / " and " /= ". I've observed that many people are lackadaisical about this, such that you'll often see expressions such as "y=a/b". These people have no option other than to search either for just "/" - which of course returns every line with a comment, or they have to construct a more sophisticated regular expression search - which again takes time and is error prone.

Thus I have three pieces of advice to pass on:
1. When you find a bug, look nearby for more.
2. If the bug was of a particular class of bug, then search your code to see if you had made the same mistake elsewhere.
3. Write your code so that it is trivial to search for certain constructs. It will save you time in the long run.

Monday, September 08, 2008

Efficient C Tips #4 - Use Speed Optimization

Back in July 2008 I promised that the next blog post would be on why you should use speed optimization instead of size optimization. Well four other posts somehow got in the way - for which I apologize. Anyway, onto the post!

In "Efficient C Tips #2" I made the case for always using full optimization on your released code. Back when I was a lad, the conventional wisdom when it came to optimization was to use the following algorithm:

1. Use size optimization by default
2. For those few pieces of code that get executed the most, use speed optimization.

This algorithm was based on the common observation that most code is executed infrequently and so in the grand scheme of things its execution time is irrelevant. Furthermore since memory is constrained and expensive, this code that is rarely executed should consume as little resource (i.e. memory) as possible. On first blush, this approach seems reasonable. However IMHO it was flawed back then and is definitely flawed now. Here is why:

1. In an embedded system, you typically are not sharing memory with other applications (unlike on a general purpose computer). Thus there are no prizes for using less than the available memory. Of course, if by using size optimization you can fit the application into a smaller memory device then use size optimization and use the smaller and cheaper part. However in my experience this rarely happens. Instead typically you have a system that comes with say 32K, 64K or 128K of Flash. If your application consumes 50K with speed optimization and 40K with size optimization, then you'll still be using the 64K part and so size optimization has bought you nothing. Conversely, speed optimization will also cost you nothing - but your code will presumably run faster, and consume less power.

2. In an interesting quirk of optimization technology, it turns out that in some cases speed optimization can result in a smaller image than size optimization! It is almost never the case that the converse is true. See however this article that I wrote which discusses one possible exception. Thus even if you are memory constrained, try speed optimization.

3. Size optimization comes with a potentially very big downside. After a compiler has done all the usual optimizations (constant folding, strength reduction etc), a compiler that is set up to do size optimization will usually perform "common sub-expression elimination". What this consists of is looking at the object code and identifying small blocks of assembly language that are used repeatedly throughout the application. These "common sub-expressions" are converted into sub routines. This process can be repeated ad nauseum such that one subroutine calls another which calls another and so on. As a result an innocuous looking piece of C code can be translated into a call tree that nests many levels deep - and there is the rub. Although this technique can dramatically reduce code size it comes at the price of increasing the call stack depth. Thus code that runs fine in debug mode may well suffer from a call stack overflow when you turn on size optimization. Speed optimization will not do this to you!

4. As I mentioned in "Efficient C Tips #2" one downside of optimization is that it can rearrange instruction sequences such that the special access requirements often needed by watchdogs, EEPROM etc are violated. In my experience, this only happens when one uses size optimization - and never with speed optimization. Note that I don't advocate relying on this; it is however a bonus if you have forgotten to follow the advice I give in "Efficient C Tips #2" for these cases.

The bottom line - speed optimization is superior to size optimization. Now I just have to get the compiler vendors to select speed optimization by default!

Thursday, September 04, 2008

Low cost tools

Like many of you, I subscribe to Jack Ganssle's newsletter (If you don't then you should - go to http://ganssle.com/). In his latest newsletter #164 (alas not yet posted to the web) there is a thread on tools for monitoring serial protocols such as I2C. I was quite interested in this because it so happens I use some of the tools mentioned. What really struck me though was the fact that someone was looking for low cost tools.

I'm always baffled when I see this. If I believe the salary surveys, most engineers in the USA are earning well over $100K. Throw in benefits and your average engineer costs his / her employer about $200K a year, or close to $100 per working hour. Why then do employer's balk at spending a few thousand dollars on a decent tool? I've seen people spend days on compiler problems because they are using a "free" tool; I've had people tell me that they don't use Lint because it's too expensive (<$200!); I've seen people struggle for days simply because their oscilloscope isn't up to the job. In all these cases, the cost in terms of their time dwarfs the equipment / tool cost.

What I want are great tools. I want tools that are intuitive to use, that work really well, are tolerant of my occasional ham-fistedness and that I trust. For example, I have a Fluke 87 multimeter sitting next to me. It costs quadruple what a Radio Shack special costs. It's worth every penny.

Here's an ending thought. You are going in for open heart surgery. The surgeon comes out and says "don't worry - I've got some great low cost tools to use on you". And we wonder why engineers don't get the respect that doctors do.

Tuesday, August 12, 2008

Have you looked at your linker output file recently?

Of all the myriad of files involved in a typical embedded firmware project, probably the two most feared (and yes I do mean feared) are the linker control file (which tells the linker how to link your application) and the linker output file. Today it's the latter which I'll be talking about.

The linker output file tells you a myriad of information about the way your application has been put together. Unfortunately, much of it is in such a cryptic format that examination of the file is a painful process. Indeed, for this reason, I suspect that most projects are completed with nothing more than a cursory look at this file.

This is a shame, because examination of the linker output file can significantly reduce your debugging time. To show you what I mean, consider my typical action sequence when I first start coding up a project.

1. Write a module.
2. Compile module and correct all errors and warnings.
3. Lint module and correct all complaints from Lint.
4. Repeat steps 1, 2 & 3 until I have sufficient modules to be able to generate a linkable image.
5. Link image and repeat steps 1-4 until the linker has no warnings or errors.
6. Examine the linker output file.

I'd wager that most developers out there would be reaching for the debugger in step 6. The reason I do not, is because I can typically find some bugs simply by looking at the linker output. For example, consider this code sequence:


if (0 == var)
{
 function_a();
} else if (1 == var)
{
 function_b();
}
else if (2 == var)
{
 function_b();
{
else
{
 function_d();
}


I make these sort of copy and paste errors all the time. In this case, when var is 2, I meant to call function_c but inadvertently I ended up calling function_b again. Since function_b exists, the compiler is happy and so there are typically no warnings.

So how does looking at the linker output file help me in this case? Well, if you have a decent linker it will give you a list of all the functions that aren't called and that consequently have been stripped out of the final image. If in perusing this list I see that function_c() is listed as uncalled, then I immediately know I've got a bug somewhere. Typically tracking it down is very easy.

I'll leave for another day the other ways I use the linker output file to debug code.

Thursday, August 07, 2008

Improvements versus Features

I'm taking a slight detour from my usual topics to blather about what I see as an unfortunate trend that is making its way from the PC world to the embedded world. My perception is that as more embedded systems get sophisticated user interfaces, the desire to add features seems inescapable. While I don't see adding features as bad, per se, doing so instead of improving the product is a bad thing. What do I mean by improving the product? Well, typically those things that most users don't understand, for example noise floors, power consumption, SNR, software reliability and so on.

In the days before user interfaces, pretty much the only way to improve a product was to work on the "invisible" parameters. Today, it's often far easier to add a new feature than it is to labor at, for example, wringing a few more db of performance out of that digital filter while keeping the number of clock cycles unchanged.

Am I tilting at windmills? I don't think so. Is my plea pointless - probably. However the next time someone comes along asking for a YANF (Yet Another New Feature), do them and you a favor and ask how time spent on the YANF compares to time spent on improving the product.

Friday, August 01, 2008

Efficient C Tips #3 - Avoiding post increment / decrement

It always seems counter intuitive to me, but post increment / decrement operations in C / C++ often result in inefficient code, particularly when de-referencing pointers. For example


for (i = 0, ptr = buffer; i < 8; i++)
{
*ptr++ = i;
}


This code snippet contains two post increment operations. With most compilers, you'll get better code quality by re-writing it like this:


for (i = 0, ptr = buffer; i < 8; ++i)
{
*ptr = i;
++ptr;
}


Why is this you ask? Well, the best explanation I've come across to date is this one on the IAR website:

Certainly taking the time to understand what's going on is worthwhile. However, if it makes your head hurt then just remember to avoid post increment / decrement operations.

Incidentally, you may find that on your particular target it makes no difference. However, this is purely a result of the fact that your target processor directly supports the required addressing modes to make post increments efficient. If you are interested in writing code that is universally efficient, then avoid the use of post increment / decrement.

You may also wonder just how much this saves you. I've run some tests on various compilers / targets and have found that this coding style cuts the object code size down from zero to several percent. I've never seen it increase the code size. More to the point, in loops, using a pre-increment can save you a load / store operation per increment per loop iteration. These can add up to some serious time savings.

Saturday, July 05, 2008

Efficient C Tips #2 - Using the optimizer

In my first post on "Efficient C" I talked about how to use the optimal integer data type to achieve the best possible performance. In this post, I'll talk about using the code optimization settings in your compiler to achieve further performance gains.

I assume that if you are reading this, then you are aware that compilers have optimization settings or switches. Invoking these settings usually has a dramatic effect on the size and speed of the compiled image. Typical results that I have observed over the years is a 40% reduction in code size and a halving of execution time for fully optimized versus non-optimized code. Despite these amazing numbers, I'd say about half of the code that I see (and I see a lot) is released to the field without full optimization turned on. When I ask developers about this, I typically get one of the following explanations:

1. I forgot to turn the optimizer on.
2. The code works fine as is, so why bother optimizing it?
3. When I turned the optimizer on, the code stopped working.

The first answer is symptomatic of a developer that is just careless. I can guarantee that the released code will have a lot of problems!

The second answer on the face of it has some merit. It's the classic "if it aint broke don't fix it" argument. However, notwithstanding that it means that your code will take longer to execute and thus almost certainly consume more energy (see my previous post on "Embedded Systems and the Environment"), it also means that there are potential problems lurking in your code. I address this issue below.

The third answer is of course the most interesting. You have a "perfectly good" piece of code that is functioning just fine, yet when you turn the optimizer on, the code stops working. Whenever this happens, the developer blames the "stupid compiler" and moves on. Well, after having this happen to me a fair number of times over my career, I'd say that the chances that the compiler is to blame are less than 1 in 10. The real culprit is normally the developer's poor understanding of the rules of the programming language and how compilers work.

Typically when a compiler is set up to do no optimization, it generates object code for each line of source code in the order in which the code is encountered and then simply stitches the result together (for the compiler aficionados out there I know it's more involved than this - but it serves my point). As a result, code is executed in the order in which you write it, constants are tested to see if they have changed, variables are stored to memory and then immediately loaded back into registers, invariant code is repeatedly executed within loops, all the registers in the CPU are stacked in an ISR and so on.

Now, when the optimizer is turned on, the optimizer rearranges code execution order, looks for constant expressions, redundant stores, common sub-expressions, unused registers and so on and eliminates everything that it perceives to be unnecessary. And therein dear reader lies the source of most of the problems. What the compiler perceives as unnecessary, the coder thinks is essential - and indeed is relying upon the "unnecessary" code to be executed.

So what's to be done about this? Firstly, you have to understand what the key word volatile means and does. Even if you think you understand volatile, go and read this article I wrote a number of years back for Embedded Systems Programming magazine. I'd say that well over half of the optimization problems out there relate to failure to use volatile correctly.

The second problematic area concerns specialized protective hardware such as watchdogs. In an effort to make inadvertent modification of certain registers less likely, the CPU manufacturers insist upon a certain set of instructions being executed in order within a certain time. An optimizer can often break these specialized sequences. In which case, the best bet is to put the specialized sequences into their own function and then use the appropriate #pragma directive to disable optimization of that function.

Now what to do if you are absolutely sure that you are using volatile appropriately and correctly and that specialized coding sequences have been protected as suggested, yet your code still does not work when the optimizer is turned on? The next thing to look for are software timing sequences, either explicit or implicit. The explicit timing sequences are things such as software delay loops, and are easy to spot. The implicit ones are a bit tougher and typically arise when you are doing something like bit-banging a peripheral, where the instruction cycle time implicitly acts as a setup or hold time for the hardware being addressed.

OK, what if you've checked for software timing and things still don't work? In my experience you are now in to what I'll call the "Suspect Code / Suspect Compiler (SCSC)" environment. With an SCSC problem, the chances are you've written some very complex, convoluted code. With this type of code, two things can happen:

1. You are working in a grey area of the language (i.e. an area where the behavior is not well specified by the standard). Your best defense against this is to use Lint from Gimpel. Lint will find all your questionable coding constructs. Once you have fixed them, you'll probably find your optimization problems have gone away.
2. The optimizer is genuinely getting confused. Although this is regrettable, the real blame may lie with you for writing knarly code. The bottom line in my experience is that optimizers work best on simple code. Of course, if you have written simple code and the optimizer is getting it wrong, then do everyone a favor and report it to the compiler vendor.

In my next post I'll take on the size / speed dichotomy and make the case for using speed rather than size as the "usual" optimization method.

Friday, June 20, 2008

Embedded Systems and the Environment

With the recent run up in the price of oil, it seems as if everyone is talking about energy and how to conserve it. For most people, the only impact they can have on the environment is through their own individual actions and choices. Engineers however, are in a different position because at a professional level, the design choices we make can have a profound effect on the environment. If we believe the figures about the number of embedded processors shipped each year (billions) and we make the very conservative estimate that each processor is in a system that consumes 1 WH per day, then the annual energy consumption of new embedded systems runs to at least 1E9 * 1 * 365 = 365 Tera Watt hours, with an average power consumption of around 41 Megawatts. If we assume that the average life of an embedded system is 5 years, then the embedded systems out there are burning about 200 Megawatts. That's a lot of power folks.

Now here's the interesting thing. Most embedded projects are for products that are made in the thousands. Individually, these products power consumption is irrelevant. Collectively they are huge. Thus if as an industry we made a concerted effort to reduce the power consumption of our products, the benefits to society would be substantial. So how exactly do we do this? Although a lot of the power consumption comes from the hardware design, the firmware design can also have a dramatic impact on the overall power consumption of the system. In my next posting I'll look at some of the ways you can design your system firmware so as to minimize power consumption.

Sunday, June 15, 2008

Efficient C Tips #1 - Choosing the correct integer size

From time to time I write articles for Embedded Systems Design magazine. A number of these articles have concentrated on how to write efficient C for an embedded target. Whenever I write these articles I always get emails from people asking me two questions:

1. How did you learn this stuff?
2. Is there somewhere I can go to learn more?

The answer to the first question is a bit long winded and consists of:
1. I read compiler manuals (yes, I do need a life).
2. I experiment.
3. Whenever I see a strange coding construct, I ask the author why they are doing it that way. From time to time I pick up some gems.
4. I think hard about what the compiler has to do in order to satisfy a particular coding construct. It's really helpful if you know assembly language for this stage.

The answer to the second question is short: No!

To help rectify this, I'll shortly be offering a one day course on how to write efficient C for embedded systems, details of which will be posted on my website soon.

In the interim, I'd like to offer up my first tip on how to choose the correct integer size.

In my experience in writing programs for both embedded systems and computers, I'd say that greater than 95% of all the integers used by those programs could fit into an 8 bit variable. The question is, what sort of integer should one use in order to make the code the most efficient? Most computer programmers who use C will be puzzled by this question. After all the data type 'int' is supposed to be an integer type that is at least 16 bits that represents the natural word length of the target system. Thus, one should simply use the 'int' data type.

In the embedded world, however, such a trite answer will quickly get you into trouble - for at least three reasons.
1. For 8 bit microcontrollers, the natural word length is 8 bits. However you can't represent an 'int' data type in 8 bits and remain C99 compliant. Some compiler manufacturer's eschew C99 compliance and make the 'int' type 8 bits (at least one PIC compiler does this), while others simply say we are compliant and if you are stupid enough to use an 'int' when another data type makes more sense then that's your problem.
2. For some processors there is a difference between the natural word length of the CPU and the natural word length of the (external) memory bus. Thus the optimal integer type can actually depend upon where it is stored.
3. The 'int' data type is signed. Much, indeed most, of the embedded world is unsigned, and those of us that have worked in it for a long time have found that working with unsigned integers is a lot faster and a lot safer than working with signed integers, or even worse a mix of signed and unsigned integers. (I'll make this the subject of another blog post).

Thus the bottom line is that using the 'int' data type can get you into a world of trouble. Most embedded programmers are aware of this, which is why when you look at embedded code, you'll see a veritable maelstrom of user defined data types such as UINT8, INT32, WORD, DWORD etc. Although these should ensure that there is no ambiguity about the data type being used for a particular construct, it still doesn't solve the problem about whether the data type is optimal or not. For example, consider the following simple code fragment for doing something 100 times:


TBD_DATATYPE i;

for (i = 0; i < 100; i++)
{
// Do something 100 times
}


Please ignore all other issues other than what data type should the loop variable 'i' be?

Well evidently, it needs to be at least 8 bits wide and so we would appear to have a choice of 8,16,32 or even 64 bits as our underlying data type. Now if you are writing code for a particular CPU then you should know whether it is an 8, 16, 32 or 64 bit CPU and thus you could make your choice based on this factor alone. However, is a 16 bit integer always the best choice for a particular 16 bit CPU? And what about if you are trying to write portable code that is supposed to be used on a plethora of targets? Finally, what exactly do we mean by 'optimal' or 'efficient' code?

I wrestled with these problems for many years before finally realizing that the C99 standards committee has solved this problem for us. Quite a few people now know that the C99 standard standardized the naming conventions for specific integer types (int8_t, uint8_t, int16_t etc). What isn't so well known is that they also defined data types which are "minimum width" and also "fastest width". To see if your compiler is C99 compliant, open up stdint.h. If it is compliant, as well as the uint8_t etc data types, you'll also see at least two other sections - minimum width types and fastest minimum width types. An example will help clarify the situation:

Fixed width unsigned 8 bit integer: uint8_t
Minimum width unsigned 8 bit integer: uint_least8_t
Fastest minimum width unsigned 8 bit integer: uint_fast8_t

Thus a uint8_t is guaranteed to be exactly 8 bits wide.
A uint_least8_t is the smallest integer guaranteed to be at least 8 bits wide.
An uint_fast8_t is the fastest integer guaranteed to be at least 8 bits wide.

So we can now finally answer our question. If we are trying to consume the minimum amount of data memory, then out TBD_DATATYPE should be uint_least8_t. If we are trying to make our code run as fast as possible then we should use uint_fast8_t.

Thus the bottom line is this. If you want to start writing efficient, portable embedded code, the first step you should take is start using the C99 data types 'least' and 'fast'. If your compiler isn't C99 compliant then complain until it is - or change vendors.

If you make this change I think you'll be pleasantly surprised at the improvements in code size and speed that you'll achieve.

Friday, June 06, 2008

Thoughts on the optimal time to test code

Today I'd like to take on one of the sacred cows of the embedded industry, namely the temporal relationship between coding and testing of the aforementioned code. The conventional wisdom seems to be as follows.

"Write a small piece of code. As soon as possible test the code. Repeat until the task is complete"


I know for many of you, me merely having the temerity to suggest this might be sub-optimal will put me firmly into the category of hopeless heretic. Well, before you write me off as a lunatic, let me tell you about an alternative approach, how I stumbled upon it and why I think it has much to commend it.

Being in the consulting business I'm typically working on multiple projects at once. Often a given project will be put on hold for any number of reasons which aren't germane to this post. As a result, it's not uncommon for me to write some code, compile it and then not touch it again for several months. I then find myself in the position of having to test / debug code that I wrote months ago. Having now done this many times, I've come to the conclusion that rather than this being a problem, it is instead the optimal temporal relationship between coding and testing.

How can this be you ask? Surely after a multi-month hiatus, the code is no longer fresh in your mind and so it must make it that much more difficult to test and debug? Well the answer is of course yes - the code is no longer fresh in my mind, and yes it does make it a little harder to test and debug in the short term. In my emphasis lies the point of my argument.

Why do we write code? Most people would claim we write code in order to make a functional product. I disagree with this assertion. I think we write code so that people coming after us can understand it and modify it. This rather strange claim is based upon those studies that show that companies spend far more money maintaining code than they do writing it. Thus the smart way to write code is to do so in a manner that gives preeminent importance to the long term maintenance of that code. So how does one do this? Well that's a topic for another post. What I can tell you, is that having to test and debug code that you wrote several months ago is a terrific way for the developer of the code to see the code as someone who'll be maintaining it will see it. You'll see the inadequate or plain wrong comments. You'll see the copy and paste errors. You'll see where you got tired and took a short cut, and you'll see those stupid mistakes caused by the telephone ringing at the wrong time.

Indeed because you don't expect the code to work (after all it's never been tested) I find you cast a very jaundiced eye over the code - and in the process find a plethora of the mistakes that one typically finds by sitting in front of a debugger. Maybe it's just me, but I'd rather find bugs via code inspection than by fighting the debug environments common to most embedded systems.

So in a nutshell, I think the optimal way to write and test code is as follows:

1. Write the code. Make sure it compiles and is Lint free.
2. Wait a few months.
3. Reread the code looking for the usual suspects of bad / wrong comments, copy and paste errors, sloppy coding etc.
4. Test it.

The person that maintains your code (quite likely a future version of you) will thank you for doing it this way.

Tuesday, May 20, 2008

visualSTATE

I have been writing this blog now for about 18 months and in reviewing my posts I've noticed that my posts are often critical of technologies, manufacturers and or products. Well today is a first for me, because I'd like to offer my first product endorsement. The endorsement goes to visualSTATE from IAR . I've been using this product for about the same length of time I've had this blog and have concluded that it represents the biggest step forward in productivity for me since I made the move from assembly language to C. (Yes folks, the move from C to C++ was a virtual non-event for me, as I found almost no improvement in my productivity, mainly I suspect because I have written for years in object oriented C).

Anyway, back to the topic of visualSTATE. If you aren't familiar with it, then you should be. It allows you to design complex, hierarchical state machines with ease and to push a button and obtain code that just seems to work. I have now completed three projects using this tool and am well on the way to finishing a fourth. In all cases, the boost to my productivity has been astonishing. I find that I spend most of my time on the functional design and almost no time on debugging the high level application.

visualSTATE's main strengths seem to be in the following areas:

1. Products that are highly modal - i.e. a product can be in one of N operating modes depending upon circumstances..
2. User interfaces. I've had great success with products that contain bespoke LCD and membrane keypads.
3. Products that contain complex sequencing requirements, particularly when coupled with a plethora of failure modes that have to be handled.

I've found the learning curve on visualSTATE to be quite long - but definetly worth it. Although you can certainly be up and running in a day or so, I found that it took me a lot longer to work out how best to partition a problem between visualSTATE and traditional code. However, with experience I'm now finding that I rarely get it wrong anymore.

I've also found some very nice and unexpected benefits from visualSTATE. To wit:

1. Code reuse. visualSTATE does of course require some code support. However, I've found that a lot of this code can be reused. As a result, I can now bring up a new board with a visualSTATE processing engine running on it in a matter of hours. Try doing that with your average RTOS.
2. Although we all know that lots of small functions are "better" than a few big functions, human nature being what it is, we tend to just expand an existing function rather than decomposing it into its constituent parts. Well when using visualSTATE I find that it almost forces one in to writing lots of small (less than 5 lines) functions. I suspect that these small functions are part of the reason that my visualSTATE projects just seem to work with almost no debugging time.
3. Documentation. As well as the documentation benefits associated with small functions (i.e. the comments actually match the code!), visualSTATE comes with a terrific documentation tool. Many of my clients quite rightly demand excellent documentation on the designs I do for them. The documentation engine in visualSTATE makes this a breeze!
4. Communication. My clients often ask questions such as "what does the code do if ...". In a traditional project this usually means pouring through complex code trying to ascertain the answer. With visualSTATE projects I find that most of the time I simply look at the state charts. Since the state charts are effectively the code (since they are tied together), then I can give an answer quickly and authoritatively - which makes my clients happy and helps assure me of future business.

All in all, kudos to IAR for such a great tool.

Sunday, May 11, 2008

Integer Log functions

A few months ago I wrote about a very nifty square root function in Jack Crenshaw's book "Math Toolkit for Real-time Programming". As elegant as the square root function is, it pails in comparison to what Crenshaw calls his 'bitlog' function. This is some code that computes the log (to base 2 of course) of an integer - and does it in amazingly few cycles and with amazing accuracy. The code in the book is for a 32 bit integer; the code I present here is for a 16 bit integer. Although you are of course free to use this code as is, I strongly suggest you buy Crenshaw's book and read about this function. You'll see it truly is a work of art. BTW, one of the things I really like about Crenshaw is that he takes great pains to note that he didn't invent this algorithm. Rather he credits Tom Lehman. Kudos to Lehman.


/**
FUNCTION: bitlog

DESCRIPTION:
Computes 8 * (log(base 2)(x) -1).

PARAMETERS:
- The uint16_t value whose log we desire

RETURNS:
- An approximation to log(x)

NOTES:
-

**/
uint16_t bitlog(uint16_t x)
{
uint8_t b;
uint16_t res;

if (x <= 8) /* Shorten computation for small numbers */
{
res = 2 * x;
}
else
{
b = 15; /* Find the highest non zero bit in the input argument */
while ((b > 2) && ((int16_t)x > 0))
{
--b;
x <<= 1;
}
x &= 0x7000;
x >>= 12;

res = x + 8 * (b - 1);
}

return res;
}

Saturday, April 12, 2008

IEC60730

Atmel has a very interesting application note on IEC60730 Class B compliance. If you aren't aware of IEC60730, there is a nice introduction here. In a nutshell IEC60730 Class B compliance is a safety standard related to household appliances. Part of IEC60730 requires that one actively monitor that a microcontroller (if one is used) is functioning correctly. This seems to be a reasonable thing to do. However, as the Atmel application note shows, meeting this requirement requires one to constantly do things such as test memory, confirm that timers are operating at the correct frequencies and so on. Again conceptually this doesn't seem unreasonable. However, my concern with this is that the very act of confirming that the hardware is functioning could result in a system failure at a critical point, thus creating the very problem the standard is designed to prevent.

For example, it's hard to argue with the contention that the stack is the most used portion of memory in most microcontrollers. I think most engineers would agree that if the memory used for the stack malfunctioned then disastrous things would most likely occur. On this basis, a regular check of the Stack memory would seem to be in order. Maybe it's just me, but the thought of running a memory test on the stack area of a processor while simultaneously trying to respond to interrupts etc seems like a very tall order. Indeed, I can easily envisage a piece of code that is designed to test the stack area malfunctioning and causing a system crash and potentially causing the very thing it's designed to avoid.

I think what it comes down to is this. The reliability of hardware seems to me to be several orders of magnitude better than the reliability of software. Thus using software to validate hardware seems problematic. I'll be very interested to see what happens the first time someone gets hurt as a result of a malfunction in software written to conform to IEC60730. If you don't think this is likely, take a look at the size of the object code produced by Atmel's suggested tests. Then consider that many household appliances use microcontrollers that contain just a few kbytes of object code - and that the IEC60730 code will thus make up a very large fraction of the delivered code. On a simplistic statistical basis, we can assume that if 30% of the code in a product is related to IEC60730 compliance, then 30% of the bugs will be in that code. Given what the code has to do, my money is that the IEC60730 compliance code will have a much higher bug rate than the general application. Thus the probability of a failure occurring in the IEC60730 code is high - and someone will get hurt when the code fails.

As a parting thought, how exactly does one set about testing code that is designed to detect hardware failures internal to an integrated circuit. Although I'm sure I could come up some test protocols for some hardware, I suspect that the Heisenberg uncertainty principle will ensure that the very act of testing the test will result in a flawed test.

Monday, February 04, 2008

The perils of overloading

This post is coming to you from Sweden - a very fine country that I heartily recommend visiting if you get the chance. (If you're wondering why I'm in Sweden - I'm here on business as one of my clients is located in Gothenburg). Anyway, the fact that I'm in Sweden is relevant to this post, as to get here I had to put myself at the mercies of United Airlines. Now the fact that the flight over here was less than perfect wouldn't be news to any of you that travel regularly. However, the reason that the flight was a disaster is relevant, as I'll now try and explain...

Upon arrival at the United check in desk at Dulles airport, I was greeted by an array of self check in kiosks, with a total of one real live human being to take care of baggage check in. Thinking myself to be computer savvy, I negotiated the check in kiosk with ease, only to be told that:
  1. I had to see the human in order to check my bags in, and
  2. The system was unable to assign me a seat and that seat assignment would be done at the gate.
The first instruction was par for the course, while the second instruction I found to be very strange. Anyway, I shrugged my shoulders and went over to the sole person working the desk. There was one gentleman in front of me. This gentleman, not unreasonably asked if he could use some of his frequent flier miles to upgrade to business class. No problem said the United employee, who proceeded to rattle the keys. After 5 minutes, he announced that although the system was showing that seats were available in business class, the computer system refused to allow him to assign a seat. This was the second clue that things were heading south in a hurry. It then took the clerk another 10 minutes to wait list the gentleman (giving a total processing time of 15 minutes). Although it's possible the clerk was incompetent, I got the impression that he really knew what he was doing, and was just being stymied by the system.

Anyway, I checked my bag in and proceeded to the gate. When I got to the gate, I found another 100+ passengers that also had no seat assignments. When eventually I got called to the counter, I found a harried women with a sea of boarding passes printed out in front of her. She was manually searching through them trying to find my name. Eventually she found it and handed it over. My nature being what it is, I politely inquired as to the reason for this astonishingly strange system of assigning seats and issuing boarding passes. Apparently this was the opportunity that the clerk had been waiting for to vent her frustration, as she gladly explained to me that the powers that be had over booked the flight. And so my gentle reader, we come to the point of this post. It was apparent that the United system was unable to handle an overbooked flight correctly, and rather than degrade gracefully, had all but collapsed. At which point I started making some snarky comments to myself about database programmers and how surely all database programmers worked in that field because they couldn't handle the rigors of the embedded / real time world and that any half decent embedded systems person would never make such an elementary mistake. It was then that I had my epiphany. We make the same mistake in the embedded world all the time. When was the last time you used RMA (Rate monotonic analysis) to guarantee that all your tasks would meet their scheduling deadlines? How many failures of embedded systems are caused by overloading (or over scheduling) and the failure to correctly assign task priorities. How many times do weird things happen in your code that you just shrug off as "one of those things"? In short, I found myself cutting a break to the poor sod that wrote United's code. I was still ticked off though!

Sunday, January 27, 2008

A new way to tell if something is an embedded system

Periodically someone tries to come up with a definition of an embedded system. For example there is an excellent and oft cited definition here. What got me thinking about this topic is the latest gadget I love to hate - my Verizon Treo phone running Windows mobile. A few years ago, there would have been no doubt that a cell phone was an embedded system. Today, the Treo, the i-Phone etc are all running versions of traditional computer operating systems, and are much more computer like than they are an embedded system. So the question is what are they - an embedded system or a computer?

Well today I offer a new simple test to tell if these devices are fish or fowl (foul is perhaps more appropriate), to wit:

"Is the device a pain in the neck to use?" If the answer is "yes", then it's a computer. My Treo is a computer. Enough said!

Friday, January 18, 2008

Electronic Component Footprints

As well as writing code and designing hardware, I also do PCB layout. I started doing this after I discovered it was often faster for me to layout a board myself than to try and convey all my requirements to a board layout person. If you've ever done PCB layout, you'll know that getting information about a device's footprint is a real pain. What you may not know is that this is a major source of errors on printed circuit boards, resulting in costly board re-spins and project delays. These errors come about for several reasons.
  1. Getting the information. Many manufacturers include packaging information directly into the parts data sheet. Other manufacturers (TI being a principal offender) instead just cite a packaging part number and say something contrite like "See our website for the latest information". One is then forced into searching a gigantic web site to discover that packaging style WP8 is what the rest of the world calls SO8. I don't mind them decoupling the packaging information from the part data sheet. I just wish they'd get with the program and discover something called Hyper-linking (it's only been around since the 1960s).
  2. Footprints are usually dimensioned as if they were a mechanical part. By this I mean that the drawing is usually rendered like most mechanical parts. Unfortunately, the layout package I use (and I suspect most of the others) treats a footprint as an electrical component. This results in all the pads being on an X-Y grid, with pin 1 usually being at (0,0). What this usually means is that one has to spend time performing a series of elementary trigonometric calculations in order to work out where to place the pads exactly. As you may imagine, this is a major source of error in footprint creation. The frustrating thing for me is that for the mechanical person providing the footprint information, it would be trivial to have their CAD system generate the information in a way that is directly usable.
  3. Many suppliers of mechanical components now offer solid models of their parts on their websites. Typically the models are offered in a number of formats (ProEngineer, Solid Works etc). Thus, if I'm using say a valve from this supplier, I don't have to create the model. I just download it and incorporate it into my working drawing. Why then do suppliers of electronic components not do the same thing for part footprints? I suspect the answer is that no one ever selected a part to use in a design because it made the layout person's job easier.
  4. Lastly, you may be unaware that the footprint for a surface mount part differs depending on whether it is to be reflow soldered or wave-soldered. Some companies (mainly in Europe) supply both footprints. Too many however simply supply the reflow footprint and leave it up to the lowly layout person to try and work out what the footprint should be for wave soldering.
So what's the point of this screed? Well, our industry is all about getting products to market as soon as possible at the lowest possible cost. Component manufacturers could help their customers (which in turn would help them) achieve this goal by simply providing information that removed the footprint bottleneck.

Sunday, January 13, 2008

Omniscient Code Generation

Hi Tech Software has recently been making a lot of noise about its "Omniscient Code Generation". In a nutshell, the technology appears to defer code generation until the entire program has been compiled, and then look at everything before generating the final object code. The end result is a dramatically more compact (and presumably faster running) program image. I haven't had a chance to play with the compiler yet (in part because it's still in beta testing). If they have done what they claim, then Hi Tech should be commended. On my list of things to check out about the technology will be:
  • Is the technology smart enough to track function calls via function pointers? If it is, then this is truly a neat piece of technology. If instead, it's one of the limitations of the product, then its usefulness to me has just plummeted.
  • Does the technology also track function calls from within interrupts? My experience is that interrupt handling is still the poor relation of compiler technology. If Hi Tech does this, then I'll be impressed.
Also of interest to me is how other compiler manufacturers will respond. Keil has performed global register coloring on its 8051 compiler for years. I suspect that the Hi Tech approach is a step beyond this, so there's a chance that Keil will be finally knocked from their #1 position in 8051 code generation. IAR offers a multi unit compilation option with some of its compilers. However, this option isn't integrated into its Embedded Workbench, so it's practically useless. With Hi Tech offering compilers for ARM, PIC & MSP430 I can see this really creating a burst of competition in the industry. Excellent!

Wednesday, August 29, 2007

An unfortunate consequence of a 32-bit world

Back in the bad old days when I was a lad, one learned about microprocessors by programming 8 bit devices in assembly language. In fact I can still remember my first lab assignment - namely to multiply two 8 bit unsigned quantities together to get a 16 bit result (without the use of a hardware multiplier of course). One of the indelible lessons that comes from doing an exercise such as this, is that it can take many instructions to perform even the most innocuous of high level language statements.

I mention this, because today I was looking at some code written by a young engineer who was recommended to me. In examining some of his code, I noticed the following construct:

int ivar;

void some_function(void)
{
...
++ivar;
...
}

interrupt void isr_handler(void)
{
...
--ivar;
...
}


Notwithstanding the fact that ivar should have been declared volatile, the most egregious mistake here was the assumption that the statement ++ivar is an atomic operation. Now if one is used to working on 32 bit machines, the concept of incrementing an integer being anything other than an atomic operation is of course ludicrous. However, in the 8 or 16 bit world where many of us labor in the embedded space, the idea of incrementing an integer being an atomic operation is equally ridiculous. The trouble is with bugs like this is that they are difficult to spot, and will only rear their head after months or even years of operation.

So, is this a case of an incompetent individual? Although nominally yes, I suspect that the real problem is that he was raised on a diet of big CPUs. Perhaps the universities could do these engineers a favor, and throw away the ARM based evaluation boards and replace them with an 8051 based system.

Thursday, August 02, 2007

Application notes code quality

All manufacturers of microcontrollers publish application notes. Some of these application notes are of course nothing more than gussied up advertising drivel. However, many of these application notes contain useful information that can cut days, and sometimes weeks off a project.
Having read hundreds of these application notes over the 25 years I've been doing this, I've come to the conclusion that whereas the application notes usually get the algorithms correct, the same can't be said for the code. Too often the code is sloppy, with bugs that are apparent merely by code inspection. May be it's just me, but whenever I see a sloppy piece of code, it makes me wonder about the underlying quality of the IC design.

I think this is unfortunate, since the manufacturer's could do much to improve things in the industry by setting a great example. To this end, I think they should:
  1. Adopt a set of coding standards that all their code adheres to.
  2. Have the code reviewed, such that egregious bugs are caught.
  3. Make the code Lint free
  4. If they are aiming the product at the automotive industry, ensure it is MISRA C compliant.
The advantages to the IC manufacturer are legion:
  1. They look good (never a bad thing)
  2. All their application note code has the same "look and feel". This encourages engineers to use their application notes, and hence their products.
  3. The code in the application note is usable "as is", speeding time to market and generally giving the perception that their product is easy to use.
  4. Less experienced engineers are taught how to do things correctly - which presumably leads to higher quality products- which presumably translates into more sales.
I guess the thing that I find maddening about this, is that the manufacturers probably spend weeks or months developing the application note, and then let themselves down by presenting their solution in such a poor way. When I talk to the marketing folks for the CPU manufacturers, I make a point of bringing some of the more egregious errors to their attention. Perhaps if all of us did this, we could get a bit of a sea change in the industry.

Wednesday, June 13, 2007

Size matters

Periodically I get printed propaganda from the semiconductor manufacturers touting their latest and greatest ICs. Evidently the marketing folks are convinced that size matters because the size of the IC is almost the first thing they tell you now. A recent example from Maxim has the headline: "Smallest, Most Efficient and Flexible Notebook Fuel-Gauging Solution".

Well size does matter. However, it seems to me that the industry has gone too far. More and more devices are being offered only in chip scale packaging (CSP). As a result, it is all but impossible to hand build a prototype, let alone cobble together a breadboard. The result of this is that in many cases it simply doesn't make economic sense to use the part simply because CSP requires the prototype board to be machine built at a cost of thousands of dollars.

I think the manufacturers are aware of this problem and are trying to address it by offering evaluation boards. While these are OK for the breadboarding phase, they don't solve the prototyping problem. Furthermore even if the project can justify the cost of machine built prototypes, probing the part or (heaven forbid) making modifications to the board is virtually impossible. The bottom line IC manufacturers. Offer all your parts in a package that can be handled by people. Please.

Monday, June 04, 2007

Understanding Stack Overflow

I suspect that many, if not all bloggers are somewhat narcissistic. In my case it shows through in that I use one of the free services that keeps track of how many visitors I get and what brought them to this blog. Well, it turns out that many of the visitors to this blog get here not because of the brilliance of my writing, but because they did a Google search on "stack overflow" often qualified by PIC, or MSP430 etc. For many of these visitors I suspect they leave empty handed. Thus in an attempt to make these visits less pointless, let me give you my take on what causes a stack overflow in an embedded system.

First of all, go read the Wikipedia description of stack overflow. There's nothing wrong with the description - it's just incomplete from an embedded systems perspective.

On the assumption that you are getting a stack overflow and that you aren't performing recursion or attempting to allocate a large amount of storage on the stack, what can be going wrong? Here's a check list.
  1. What's your stack size set to? If you don't understand the question then you need an introductory course to embedded systems programming. If you do understand the question - but don't know the answer - then this is the most likely source of your problem. How can this be you ask? Well, most embedded systems compilers are designed to work with a particular family of processors. The low end of the family may have a tiny amount of memory (e.g. 128 bytes). As such setting the default stack size to 16 bytes may be a sensible thing to do. Thus, your first step is to ensure that the stack size is set to something reasonable for your system.
  2. Which stack is overflowing? Many processors / compilers support / implement multiple stacks. A typical dichotomy is a call stack (upon which the return addresses of functions are stored) and a data or parameter stack (upon which automatic variables are stored). If you are using an RTOS, then typically there will be a shared call stack while each thread will have its own data stack. Thus is it the shared call stack that is overflowing, or is it the parameter stack associated with a particular task? Once you've made the determination which stack is overflowing then finding out exactly what gets placed on that stack will help lead you to the solution to your problem. If you can see no obvious high level language construct that is causing the problem, then the single most likely cause of your misery is an interrupt service routine...
  3. An interrupt service routine can use up an extraordinary amount of space on the stack. For a discussion of how this arises and its impact on performance, see this article. This problem is compounded if your system allows interrupts to be nested (that is, it allows an ISR to itself be interrupted).
  4. Certain library functions (printf() and its brethren are prime offenders) can use an enormous amount of stack space.
  5. If you are writing partially in assembly language, are you failing to pop every register that you pushed? This often occurs if you have more than one exit point from a function or ISR.
  6. If you are writing entirely in assembly language, did you set up the stack pointer correctly and do you know which way the stack grows?
  7. Have you made the mistake of programming a microcontroller that you don't understand? For example, low end PIC processors have a tiny call stack which is easily overflowed. If you are programming a PIC and don't know about this limitation, then quite frankly, I'm not surprised you are having problems.
  8. If none of the above solve your problem, then I'm afraid you are most likely in to a stack over-write problem. That is, a pointer is being de-referenced that results in the stack being overwritten. This can often arise when you allocate an array on the stack and then access an element beyond the end of the array. Lint will find a lot of these problems for you. If you don't know what Lint is, see this article. If you do know what Lint is and aren't using it then you deserve to be faced with these sorts of problems.

Saturday, May 19, 2007

Continued Fractions

Once in a while something happens that makes me realize that techniques that I routinely use are simply not widely known in the embedded world. I had such an epiphany recently concerning continued fractions. If you don't know what these are, then check out this link.

As entertaining as the link is, let me cut to the chase as to why you need to know this technique. In a nutshell, in the embedded world we often need to perform fixed point arithmetic for cost / performance reasons. Although this is not a problem in many cases, what happens when you need to multiply something by say 1.2764? The naive way to do this might be:

uint16_t scale(uint8_t x)
{
uint16_t y;

y = (x * 12764) / 10000;

return y;
}

As written, this will fail because of numeric overflow in the expression (x * 12764). Thus it's necessary to throw in some very expensive casts. E.g.

uint16_t scale(uint8_t x)
{
uint16_t y;

y = ((uint32_t)x * 12764) / 10000;

return y;
}

Our speedy integer arithmetic isn't looking so good now is it?

What we really want to do is to use a fraction (a/b) that is a close approximation to 1.2764 - but (in this case) has a numerator that doesn't exceed 255 (so that we can do the calculation in 16 bit arithmetic).

Enter continued fractions. One of the many uses for this technique is finding fractions (a/b) that are approximations to real numbers. In this case using the calculator here, we get the following results:

Convergents:
1: 1/1 = 1
3: 4/3 = 1.3333333333333333
1: 5/4 = 1.25
1: 9/7 = 1.2857142857142858
1: 14/11 = 1.2727272727272727
1: 23/18 = 1.2777777777777777
1: 37/29 = 1.2758620689655173
1: 60/47 = 1.2765957446808511
1: 97/76 = 1.2763157894736843
1: 157/123 = 1.2764227642276422
2: 411/322 = 1.2763975155279503
3: 1390/1089 = 1.2764003673094582
1: 1801/1411 = 1.2763997165131113
1: 3191/2500 = 1.2764


We get higher accuracy as we go down the list. In this case, I chose the approximation (157 / 123) because it's the highest accuracy fraction that has a numerator less than 255. Thus my code now becomes:

uint16_t scale(uint8_t x)
{
uint16_t y;

y = ((uint16_t)x * 157) / 123;

return y;
}

The error is less than 0.002% - but the calculation speed is dramatically improved because I don't need to resort to 32 bit arithmetic. [On an ATmega88 processor, calling scale() for every value from 0-255 took 148,677 cycles for the naive approach and 53,300 cycles for the continued fraction approach.]

Incidentally, you might be wondering if there are other fractions that give better results than the ones generated by this technique. The mathematicians tell us no.

So there you have it. A nifty technique that once you know about it will make you wonder how you got along without it for all these years.

Tuesday, May 01, 2007

H1-b visas and Economics 101

USA Today has a story today about how 123,000 applications were received within 48 hours of this years H1-b visa lottery being opened on April 1. Given that there are 65,000 visas granted a year, there seems to be a large mismatch between supply and demand. Although the USA Today story talks about some of the sexy positions (Supermodels! Complete with alluring photograph!), the reality is that most of these applications are for the fields of electronics and computing, including embedded systems.

This topic interests me, in part because I came to the USA on a similar visa program (actually an E2 - but that's another story).

Anyway, whenever this topic comes up, there's normally some quote from a high tech industry executive explaining that they simply can't get enough talented folks - and hence the need for the program. Whenever, I see this argument advanced, I'm always struck by the failure of the journalist to ask a basic question - namely "What would you do if the program was eliminated?" I suspect that the honest executive would answer:
  1. Lobby like mad to get it reinstated
  2. Pay what I had to to get the talent I needed
  3. Look to put the work where the talent is (i.e. ship it overseas).
Whereas I could probably discourse for a long time on answer 1, it's the other two that intrigue me.

The reality today is that enrollment in engineering is dropping. If one was to look at non first / second generation immigrant enrollment, I'd hazard a guess that it has all but collapsed. This is despite the fact that engineering in general (and electrical engineering in particular) is always one of the highest paying jobs upon graduation, with recent graduates earning about $65K, versus the $30K earned by your typical liberal arts major. So, what would happen if these salaries doubled? Would this be enough to attract more home grown talent in to the industry? Economics 101 would suggest that if you raise the salaries high enough then supply will rise to meet the demand. The question is, by how much would salaries have to rise?

Economics 101 also suggests that as the price of a good / service rises, it is highly likely that the consumer will look for a substitute. At present this works by bringing folks in on the H1-b program. If the program was eliminated, then I assume that this would be done by shipping more work overseas.

I guess this leads me to the point of my post. The USA prides itself on its capitalist approach - and the belief that the free market is inherently the best way to solve all (OK, most) problems. As a result, Americans normally abhor government interference in the market place. But isn't that exactly what is being done here?

If we genuinely believe in the free market, then the H1-b visa program should be abolished. Salaries would rise for engineers, more students would study engineering - and more work would go overseas. I have no idea whether the end result would be beneficial to engineers or not. It would however be ideologically consistent.

The economic purists might argue that the H1-b visa should be scrapped in the sense that anyone who wished to work here should be allowed to do so. I agree that this is also ideologically consistent. However, the reality is that the USA limits immigration in all fields. Thus to be truly consistent this would require the USA to do the same for all jobs - which is tantamount to saying there are no limits on immigration - something which isn't going to happen.

Saturday, April 21, 2007

Crest factor, Square roots & neat algorithms

I've been programming microcontrollers for about 25 years now - and can count on one hand the number of times I've needed to compute the square root of an integer. This curious drought came to an end recently when I needed to compute the Crest Factor of the line voltage being used to power a product I was designing. (For the uninitiated / rusty out there, Crest Factor is the ratio of the Peak : RMS of a waveform. For example, A sine wave has a CF of 1.414, whereas a square wave has a CF of 1.000).

Why, you might ask, do I need to compute the CF? Well, the product uses triacs to control a number of AC loads. If the system is inadvertently powered from a square wave inverter, or just a really lousy generator, then the triacs will not self-commutate - and I could never turn off the loads. Thus to prevent this unfortunate scenario, I need to know how good (i.e. sinusoidal) the line voltage is. The CF is a direct figure of merit that allows me to make this decision.

Evidently, the computation of CF requires one to compute an RMS voltage, which in turn requires one to calculate the square root of a number. For various reasons, I need to compute the CF on a mains cycle by cycle basis - and I'm using a 7.37 MHz ATmega CPU. Thus, the computational efficiency of the algorithm is important.

Now IAR has a nifty little algorithm that computes an approximate square root. See http://supp.iar.com/Support/?note=18180&from=search+result

However, this gets blown away by the algorithm described by Crenshaw in his wonderful book: Math Toolkit for Real-Time Programming, CMP Books. ISBN 1-929629-09-5.

The code in his book is for computing the square root of a 32 bit unsigned integer. I adapted it to give the square root of a 16 bit integer. Here's the code:

static inline uint8_t friden_sqrt16(uint16_t val)
{
uint16_t rem = 0;
uint16_t root = 0;
uint8_t i;

for(i = 0; i < 8; i++)
{
root <<= 1;
rem = ((rem << 2) + (val >> 14));
val <<= 2;
root++;
if (root <= rem)
{
rem -=root;
root++;
}
else
{
root--;
}
}
return (uint8_t)(root >> 1);
}


This will compute the exact square root of a 16 bit integer in about 268 clock cycles on an AVR - i.e. in about 33 microseconds on an 8 MHz AVR processor.

To Crenshaw's point - don't just blindly use the code, but endeavor to understand how it works. Only then will you see it for what it truly is - a work of art. Thanks Jack.

Saturday, March 31, 2007

Tool Upgrades

As a consultant that does hardware , firmware & software work for my clients, I use a large array of software tools - half a dozen compilers, schematic capture and PCB layout tools, analysis tools as well as the usual gaggle of productivity tools that non-engineers also use. Throw in the tools for running a business and my PC is a regular treasure trove of applications.

With all these tools, the number of upgrades / updates is starting to get out of hand. Every week, it seems I'm updating a major application. The most common scenario seems to be:
  1. I haven't used a tool in a month or so.
  2. I invoke it - and it tells me that an update is available. Often the mandate is 'mandatory' or at least 'recommended'.
  3. I accept the update.
  4. The download proceeds. Some of them are simply enormous (Ever downloaded the Xilinx Webpack IDE?)
  5. The patch then proceeds. The time to execute the patch is often considerable.
  6. Finally - the dreaded 'You must restart your computer' directive. I've a dozen applications open, web pages marked, manuals at strategic places - and now I have to close them all down.
Having gone through all this rigmarole, I can finally start using the tool. Of course by now, I just want to 'get on with it', and so the release notes often get cursory attention. Inevitably, if I do read the release notes then I find the upgrade is completely useless to me (e.g. support for a new device that I'm not using). If I don't read the release notes then of course there's this really neat feature that's been added that really makes life easier - and I don't find out about it until weeks later.

Well - enough complaining. Do I have any suggestions? I think so. I'd like tool vendors to realize that their tool isn't the only one in the box - and that many of us use it on a less than daily basis. With this perspective, I'd like the tool vendors to do the following:
  1. Download upgrades in the background. A lot of applications already do this - they all should.
  2. Inform me there is an update available when I close the tool rather than open it. That way I can allow the update to occur while I'm off doing productive work elsewhere.
  3. Do everything you can to avoid requiring the user to re-boot their computer.
  4. Limit updates to one or two a year. I know product managers want folks on support contracts to feel they are getting their money's worth - but this only works if my life revolves around that tool - and it doesn't!

Thursday, December 14, 2006

Wanted - a new performance metric

In the bad old days, the two major performance concerns in CPU selection were whether a CPU had enough processing power and memory to get the job done. Although these are still issues, it's a rare problem that requires more bandwidth and memory than can be provided by the CPU vendors.

By contrast, today, well over half of the systems I work on are battery powered, and so I find the major question I have when designing an embedded system is 'how long will the battery last?' If you can work this out from studying the data sheets of the various CPU vendors then you're a better engineer than me.

Thus to solve this problem, I propose that we introduce a new performance metric - namely how much energy (Joules) does it take to perform a set of standard tasks. Rather than the usual bunch of quasi meaningful benchmarks, I'd like to see benchmarks such as:

  1. How much energy does it take to receive and tra