Efficient C Tips #1 - Choosing the correct integer size
From time to time I write articles for Embedded Systems Design magazine. A number of these articles have concentrated on how to write efficient C for an embedded target. Whenever I write these articles I always get emails from people asking me two questions:
1. How did you learn this stuff?
2. Is there somewhere I can go to learn more?
The answer to the first question is a bit long winded and consists of:
1. I read compiler manuals (yes, I do need a life).
2. I experiment.
3. Whenever I see a strange coding construct, I ask the author why they are doing it that way. From time to time I pick up some gems.
4. I think hard about what the compiler has to do in order to satisfy a particular coding construct. It's really helpful if you know assembly language for this stage.
The answer to the second question is short: No!
To help rectify this, in my copious free time I'll consider putting together a one day course on how to write efficient C for embedded systems. If this is of interest to you then please contact me via my website website soon.
In the interim, I'd like to offer up my first tip on how to choose the correct integer size.
In my experience in writing programs for both embedded systems and computers, I'd say that greater than 95% of all the integers used by those programs could fit into an 8 bit variable. The question is, what sort of integer should one use in order to make the code the most efficient? Most computer programmers who use C will be puzzled by this question. After all the data type 'int' is supposed to be an integer type that is at least 16 bits that represents the natural word length of the target system. Thus, one should simply use the 'int' data type.
In the embedded world, however, such a trite answer will quickly get you into trouble - for at least three reasons.
1. For 8 bit microcontrollers, the natural word length is 8 bits. However you can't represent an 'int' data type in 8 bits and remain C99 compliant. Some compiler manufacturer's eschew C99 compliance and make the 'int' type 8 bits (at least one PIC compiler does this), while others simply say we are compliant and if you are stupid enough to use an 'int' when another data type makes more sense then that's your problem.
2. For some processors there is a difference between the natural word length of the CPU and the natural word length of the (external) memory bus. Thus the optimal integer type can actually depend upon where it is stored.
3. The 'int' data type is signed. Much, indeed most, of the embedded world is unsigned, and those of us that have worked in it for a long time have found that working with unsigned integers is a lot faster and a lot safer than working with signed integers, or even worse a mix of signed and unsigned integers. (I'll make this the subject of another blog post).
Thus the bottom line is that using the 'int' data type can get you into a world of trouble. Most embedded programmers are aware of this, which is why when you look at embedded code, you'll see a veritable maelstrom of user defined data types such as UINT8, INT32, WORD, DWORD etc. Although these should ensure that there is no ambiguity about the data type being used for a particular construct, it still doesn't solve the problem about whether the data type is optimal or not. For example, consider the following simple code fragment for doing something 100 times:
TBD_DATATYPE i;
for (i = 0; i < 100; i++)
{
// Do something 100 times
}
Please ignore all other issues other than what data type should the loop variable 'i' be?
Well evidently, it needs to be at least 8 bits wide and so we would appear to have a choice of 8,16,32 or even 64 bits as our underlying data type. Now if you are writing code for a particular CPU then you should know whether it is an 8, 16, 32 or 64 bit CPU and thus you could make your choice based on this factor alone. However, is a 16 bit integer always the best choice for a particular 16 bit CPU? And what about if you are trying to write portable code that is supposed to be used on a plethora of targets? Finally, what exactly do we mean by 'optimal' or 'efficient' code?
I wrestled with these problems for many years before finally realizing that the C99 standards committee has solved this problem for us. Quite a few people now know that the C99 standard standardized the naming conventions for specific integer types (int8_t, uint8_t, int16_t etc). What isn't so well known is that they also defined data types which are "minimum width" and also "fastest width". To see if your compiler is C99 compliant, open up stdint.h. If it is compliant, as well as the uint8_t etc data types, you'll also see at least two other sections - minimum width types and fastest minimum width types. An example will help clarify the situation:
Fixed width unsigned 8 bit integer: uint8_t
Minimum width unsigned 8 bit integer: uint_least8_t
Fastest minimum width unsigned 8 bit integer: uint_fast8_t
Thus a uint8_t is guaranteed to be exactly 8 bits wide.
A uint_least8_t is the smallest integer guaranteed to be at least 8 bits wide.
An uint_fast8_t is the fastest integer guaranteed to be at least 8 bits wide.
So we can now finally answer our question. If we are trying to consume the minimum amount of data memory, then our TBD_DATATYPE should be uint_least8_t. If we are trying to make our code run as fast as possible then we should use uint_fast8_t.
Thus the bottom line is this. If you want to start writing efficient, portable embedded code, the first step you should take is start using the C99 data types 'least' and 'fast'. If your compiler isn't C99 compliant then complain until it is - or change vendors.
If you make this change I think you'll be pleasantly surprised at the improvements in code size and speed that you'll achieve.
Next Tip


9 Comments:
You mentioned that a lot of programmers use their own defined types: UINT8, INT32, WORD, DWORD, and such.
Before C99, I could understand that. But I see this still happen in projects that are using C99 compliant toolchains. Do you have any idea why people continue to do this instead of use the C99 types? Because I sure don't - in some cases, they simply did "typedef uint32_t UINT32;", so they certainly knew about the existence of the C99 type in the first place ...
I think it's partly in house coding standards and partly personal preference. The former is understandable; the latter is inexcusable. Personally I hate the syntax that the C99 committee came up with. However I still use it.
Hi
Thanks for the tip, I didn't know about these optimized types. I like C99 syntax, unfortunately many companies uses its own data types. This makes code less readable, portable and source of bugs as well.
Best Regards,
Vlad
Indeed they do. The sooner that everyone starts using the C99 syntax the better IMHO.
I've written a lot of embedded code over the years, in come cases code that has to compile on many platforms (one case - the code compiles on at least 12 different processors, with toolchains ranging from 8 bit embedded, 16 bit, 32 bit, some using off the shelf compilers, some gcc, some borland c, some linux gcc...)
In this case, C99 compliance can't be assured. The only way to go is a bit fat file full of #if's that detect the particular compiler, and define the in-house standards int8u, int8s, int16u, etc etc (including a fast_int type as well).
This works exceptionally well when you want extremely portable code and can't assuse C99 for anything.
One thing overlooked in this enthusiasm for C99 is code validation: Let's say the compiler uses a two byte native operand for the "fast" type in the example. And lets say someone comes back and creates a bug by extended the loop limit to 300. Maybe the compiler notices this and generates a warning based on the compare, but probably it doesn't. And the code tests great. So now we think the code is solid. But, of course, it breaks as soon as we port it to "fast is 8 bits"--possibly in a non-obvious way.
Mike Layton
Mike - I like your observation. It is indeed a weakness of these C99 data types that I had not previously considered.
"Indeed they do. The sooner that everyone starts using the C99 syntax the better IMHO.
1/25/2010 2:33 PM"
Nigel, I understand your enthusiasm, but C99 was adopted in 2000, your comment is about 10 years later - perhaps it's time to think of why everyone hasn't started using it and get back to coding.
JW
ashleigh: for my portable code using C99 standard integers, I try not to litter my code with #if's, but instead just have a ports/xx directory that includes a stdint.h for the compiler that is lacking C99 support for standard integers.
Nigel: as for using fast/least standard integers in a project, I have I have two rules about optimizing that I learned while studying Extreme Programming:
Rule 1: Don't do it.
Rule 2: (for experts only). Don't do it yet - that is, not until you have a perfectly clear and unoptimized solution.
Post a Comment
<< Home