Optimization
- Cost-benefit analysis: Does it really matter if
a part of your code is slow? This depends on usage
patterns and user expectations.
- It helps to be aware of language-specific performance
issues (especially C++). These issues need to be understood
and dealt with in early design stages, because they can
affect the design in fundamental ways (e.g. virtual
functions being replaced by static polymorphism).
- Make use of the large base of existing libraries.
Build your application on top of libraries which offer
high performance. Writing an efficient libraries is a
lot of work; you don't want to do it yourself.
- Use a profiler before, during, and after you optimize
to gauge improvements and ensure you aren't wasting your
time trying to tune a routine that accounts for only 2% of
the runtime.
- Look for algorithm and data structure (trees, hash
tables, spatial data structures) improvements first.
These will offer the largest performance savings and will
outlive platform-specific tuning. Maybe you have O(N)
algorithms which could be replaced by O(logN) ones.
- Be aware that what is good for one architecture may be
bad for another. If you have to support many platforms,
you should code in a way that guarantees good (not
necessarily stellar) performance on all platforms. Note that
some libraries are able to tune themselves for particular
architectures; if you build on top of them, problem solved.
- Understand which optimizations the compiler is
competent to perform, and let it do them. You should
be optimizing at a level above that of the compiler,
or you are wasting your time.
- Be aware of platform tools: profilers, hardware counters,
assembly annotators, etc. Some compilers (e.g. Intel, SGI)
have full-blown performance tuning environments.
- Experiment with optimization flags. Sometimes higher
levels of optimization produce slower code. Measure compile
time/execution time tradeoffs: who cares if a 3% performance
improvement requires two more hours of compilation? I will
sometimes hack up a script to run through the relevant
combinations of optimization flags and measure compile time
vs run time.
- Try using the "assume no aliasing" flag on your compiler
to see if there is a big performance improvement. This is a
dangerous flag to use, but it provides a quick way to
determine if aliasing is an issue. If it is, you might be
able to use the
restrict
keyword to fix the
problem.
- You can commit efficiency sins of the grossest sort as
long as your inner loops are tight. You don't have to avoid
slow language features such as virtual functions,
new
and delete
or even scripting
(e.g. driving your C++ program from Python/Perl/Eiffel etc.)
Just keep these features away from your inner loops. It's all
about amortization.
There are many optimizations which are usually useless and
only serve to obscure the code. These optimizations make no
difference because of modern architectures and compilers:
- Replacing floating-point math with integer arithmetic.
On some platforms, integer math is slower than floating-point
math. Floating-point has been optimized to death by the
hardware gurus, and it's amazingly efficient.
- Deciding what data should be in registers. The
register
keyword is obsolete. Compilers do a
much better job of register assignment than you will.
- Loop unrolling. The compiler almost always does a better
job. When some compilers encounter manually unrolled loops,
they will de-unroll it, then unroll it properly.
- Reversing loops: for (i=N-1; i<= 0; i--). On some
platforms this can increase performance by 3%; on others
it can decrease performance by 10%.
- Replacing multiplications with additions. Modern CPUs
can issue one multiplication each clock cycle. Many of the
"strength reduction" tricks from the 70s and 80s will
now result in slower code. Replacing floating-point
division with multiplication is still relevant; some
compilers will do this automatically if a certain option
is specified.
Tips for efficient code generation on compilers:
- Data type size: integer/ word is the most fundamental
and normally the most efficient type for a given processor.
In practice, using a smaller type for a local variable is
computationally inefficient, as it will first promoted to
integer, perform operation and then truncate the result to
store back into original data type, unless the processor using
8 or 16-bit word size. This is a speed-memory trade-off.
- Variable signedness: Unsigned variables can result in
more efficient generated code for certain operations
(division, modulus) than using signed variables. Most
architectures� ABIs (Abstract Binary Interface) define plain
char
type as smallest type that most efficient.
So, unless a character needs to be signed or unsigned, use
the plain char
type.
- Access types:
- Use
static
keyword (or C++
anonymous namespaces) for global data whenever possible. In
some cases, this allows compiler to deduce things about
the access patterns to a variable. On the other hand, avoid
static function-local data. The data's value needs to be
preserved between calls to the function, making it very
expensive to access the data and requiring permanent RAM
storage.
const
keyword is helpful in a couple of ways.
First, constant variables can usually be allocated to a
read-only data section, which can save on the amount of RAM
required if the read-only data is allocated to flash or ROM.
Second, when applied to a pointer or reference parameter, it
allows the compiler to recognize certain cases when a variable
is guaranteed not to be modified through the pointer by a
function call.
restrict
keyword tells a compiler that the
specified pointer that is the only way to access a given piece
of data (no alias of this pointer). This can allow the compiler
to optimize more aggressively.
- Scope: Declare a variable in the inner-most scope
possible, particularly when its address needs to be taken.
In such cases, the compiler must keep the variable around on
the stack until it goes out of scope. This can inhibit certain
optimizations that depend on the variable being dead.
- Pointer: It is usually more efficient to have a function
return a scalar value (
i = Function();
) than to
pass the scalar value by reference or by address
(Function(&i);
), as reference or pointer forces
the compiler to allocate the variable on the stack, all but
assuring less efficient code generation.
Index