In my first post on “Efficient C” I talked about how to use the optimal integer data type to achieve the best possible performance. In this post, I’ll talk about using the code optimization settings in your compiler to achieve further performance gains.
I assume that if you are reading this, then you are aware that compilers have optimization settings or switches. Invoking these settings usually has a dramatic effect on the size and speed of the compiled image. Typical results that I have observed over the years is a 40% reduction in code size and a halving of execution time for fully optimized versus non-optimized code. Despite these amazing numbers, I’d say about half of the code that I see (and I see a lot) is released to the field without full optimization turned on. When I ask developers about this, I typically get one of the following explanations:
1. I forgot to turn the optimizer on.
2. The code works fine as is, so why bother optimizing it?
3. When I turned the optimizer on, the code stopped working.
The first answer is symptomatic of a developer that is just careless. I can guarantee that the released code will have a lot of problems!
The second answer on the face of it has some merit. It’s the classic “if it aint broke don’t fix it” argument. However, notwithstanding that it means that your code will take longer to execute and thus almost certainly consume more energy (see my previous post on “Embedded Systems and the Environment”), it also means that there are potential problems lurking in your code. I address this issue below.
The third answer is of course the most interesting. You have a “perfectly good” piece of code that is functioning just fine, yet when you turn the optimizer on, the code stops working. Whenever this happens, the developer blames the “stupid compiler” and moves on. Well, after having this happen to me a fair number of times over my career, I’d say that the chances that the compiler is to blame are less than 1 in 10. The real culprit is normally the developer’s poor understanding of the rules of the programming language and how compilers work.
Typically when a compiler is set up to do no optimization, it generates object code for each line of source code in the order in which the code is encountered and then simply stitches the result together (for the compiler aficionados out there I know it’s more involved than this – but it serves my point). As a result, code is executed in the order in which you write it, constants are tested to see if they have changed, variables are stored to memory and then immediately loaded back into registers, invariant code is repeatedly executed within loops, all the registers in the CPU are stacked in an ISR and so on.
Now, when the optimizer is turned on, the optimizer rearranges code execution order, looks for constant expressions, redundant stores, common sub-expressions, unused registers and so on and eliminates everything that it perceives to be unnecessary. And therein dear reader lies the source of most of the problems. What the compiler perceives as unnecessary, the coder thinks is essential – and indeed is relying upon the “unnecessary” code to be executed.
So what’s to be done about this? Firstly, you have to understand what the key word volatile means and does. Even if you think you understand volatile, go and read this article I wrote a number of years back for Embedded Systems Programming magazine. I’d say that well over half of the optimization problems out there relate to failure to use volatile correctly.
The second problematic area concerns specialized protective hardware such as watchdogs. In an effort to make inadvertent modification of certain registers less likely, the CPU manufacturers insist upon a certain set of instructions being executed in order within a certain time. An optimizer can often break these specialized sequences. In which case, the best bet is to put the specialized sequences into their own function and then use the appropriate #pragma directive to disable optimization of that function.
Now what to do if you are absolutely sure that you are using volatile appropriately and correctly and that specialized coding sequences have been protected as suggested, yet your code still does not work when the optimizer is turned on? The next thing to look for are software timing sequences, either explicit or implicit. The explicit timing sequences are things such as software delay loops, and are easy to spot. The implicit ones are a bit tougher and typically arise when you are doing something like bit-banging a peripheral, where the instruction cycle time implicitly acts as a setup or hold time for the hardware being addressed.
OK, what if you’ve checked for software timing and things still don’t work? In my experience you are now in to what I’ll call the “Suspect Code / Suspect Compiler (SCSC)” environment. With an SCSC problem, the chances are you’ve written some very complex, convoluted code. With this type of code, two things can happen:
1. You are working in a grey area of the language (i.e. an area where the behavior is not well specified by the standard). Your best defense against this is to use Lint from Gimpel. Lint will find all your questionable coding constructs. Once you have fixed them, you’ll probably find your optimization problems have gone away.
2. The optimizer is genuinely getting confused. Although this is regrettable, the real blame may lie with you for writing knarly code. The bottom line in my experience is that optimizers work best on simple code. Of course, if you have written simple code and the optimizer is getting it wrong, then do everyone a favor and report it to the compiler vendor.
In my next post I’ll take on the size / speed dichotomy and make the case for using speed rather than size as the “usual” optimization method.