obsolete - rolled into TRs and blueprints
This page collects areas the Toolchain WG might move into into the future.
Note that there is a lot of duplication between this, the blueprints, and the CSL contract work items.
We emphasise performance while being neutral on correctness.
The register allocator is designed around the needs of architectures with a low register count and restrictive register classes. The ARM architecture has many general purpose registers. Different assumptions may give better code.
The ARM and, to a lesser extent, Thumb-2 ISAs allow conditional execution of instructions. This can be used in many situations to eliminate an expensive branch. The middle end expands and transforms branches. The ARM backend tries to recombine the RTL back into conditional instructions, but often can't due to the middle end transforms.
Richard: could you describe what the change may be? Is it expanding the conditionals earlier, or tracking them through the middle end?
ARMv6 has some byte and half-word vector instructions that work on the main register set. Implement.
The vectoriser is designed for a vector unit where all registers are the same width. NEON has 64 bit and 128 bit registers and a significant cost in changing between them.
Implement the Cortex-A9 pipeline description.
Implement improvements for Cortex-A+MP. What are these? How is multi-processing exposed to the compiler?
Describe the separate ARM and NEON pipelines and cost of shifting back and forth between them.
Note that there is a conflict here between 'Linaro the general purpose system' and specific producs that the vendor will ship. Linaro should run at its best on all systems, while a specific product may only have support for the final platform.
Implement the mem* and str* routines in assembler.
Implement IFUNCs in the linker. Supply different flavours of the core library routines that are selected from at runtime by IFUNCs.
Consider runtime specialisation where a set of functions are profiled at runtime on the specific hardware and workload and the best one chosen (a'la http://www.fftw.org/).
Profile guided optimisation
Link time optimisation
Hard float support
Hard float support may give better performance by making the VFP registers available for parameters and removing the cost of moving floating point values back and forth between the core registers and VFP registers across function calls.
Performance can also be acheived by giving the developer the right set of tools.
These are efforts that might be done in the far future but can't be justified now.
Back end simplification
The ARM back end currently supports three different ISAs, many optional features, and architectures from ARMv4 up. We could simplify by pruning back support.
ARM, Thumb, Thumb-2, ARMv5, ARMv6, and ARMv7 are still current so there may not be much to prune. GCC takes a long time to compile code. A faster compile gives a faster debug cycle and a better time to market.
Improve compile time
GCC takes a long time to compile code. A faster compile gives a faster debug cycle and a better time to market.
WorkingGroups/ToolChain/Future (last modified 2010-10-19 01:30:53)