This page describes some common compiler/linker optimizations.

-O3

-O3 enables several additional compiler optimizations such as tree vectorizing and loop unswitching, and optimizes for speed over code size somewhat more aggressively than -O2, e.g. by inlining all calls to small static functions.

It is available on any platform supported by gcc.

OpenMP

OpenMP is a simple API that makes it easier for a programmer to make use of multi-core or multi-processor systems, e.g. by automatically splitting marked loops into several threads.

Example:

 #pragma omp parallel for
 for(int i=0; i<100; i++)
    do_something(i);

Would use up to 100 threads to do its job.

It is available on plaforms supported by gcc that can use libgomp, gcc's OpenMP library. This includes most platforms that support POSIX threads - but -- initially -- not Android.

Loop parallelization

Loop parallelization takes OpenMP a step further by automatically determining which loops are suitable for "#pragma omp parallel for" and similar constructs. This allows code that was written without multiprocessing in mind (such as most code written specifically for ARM platforms - multicore/SMP ARM systems are quite new) to take advantage of multicore/SMP systems (to some extent) without having to modify the code.

Compiler flag: -ftree-parallelize-loops=<X> (where X is the number of threads to be optimized for - typically the number of CPU cores in the target system)

Available on anything supported by gcc that has both libgomp and graphite (incl. CLooG, PPL or ISL) - the original Android toolchain has neither of those.

binutils: --hash-style=gnu

By default, ld creates SysV style hash tables for function tables in shared libraries. With --hash-style=gnu, we switch to GNU style hashes, making symbol lookup a lot faster.

binutils: -Bsymbolic-functions

Speed up the dynamic linker by binding references to global functions in shared libraries where it is known that this doesn't break things (it's safe for libraries that don't have any users trying to override their symbols - it's probably safe to assume e.g. skia and opengl could benefit).

More Details

binutils/gcc: -flto, -fwhole-program

Link-Time Optimization - causes code to be optimized again at link time, when the compiler knows what functions are called form what parts of the code, what functions are only called with constant parameters, etc.

gcc: -mtune=cortex-a9 (or whatever the actual target CPU is)

The Android build system uses -march=arm-v7a, which is good -- but it doesn't do any tuning for the specifc CPU type (e.g. cortex-a8 vs. cortex-a9).

gcc: -fvisibility-inlines-hidden

Don't export C++ inline methods in shared libraries. Makes the symbol table smaller, improving startup time and diskspace efficiency

gcc: -fstrict-aliasing -Werror=strict-aliasing

Currently, Android uses -fno-strict-aliasing unconditionally for thumb code, to work around some pieces of code that violate strict aliasing rules. Using -Werror=strict-aliasing, we can determine what pieces of code are affected, and fix them, or limit the use of -fno-strict-aliasing to the specific files that need it - enabling the rather useful strict-aliasing optimization for the rest of the build

gcc: Investigate Graphite optimizations that aren't even enabled at -O3

  • -fgraphite-identity
  • -floop-block
  • -floop-interchage
  • -floop-strip-mine
  • -ftree-loop-distribution
  • -ftree-loop-linear

gcc: Investigate SMS optimizations that aren't even enabled at -O3

Technical Details

  • -fmodulo-sched
  • -fmodulo-sched -fmodulo-sched-allow-regmoves

WorkingGroups/ToolChain/OptimizationDescriptions (last modified 2011-09-07 11:35:56)