Bi-Endian Compiler

Introduction

http://www.drdobbs.com/architecture-and-design/writing-a-bi-endian-compiler/240003090 describes a Bi-Endian compiler. This is a compiler where data can be explicitly marked as big- or little- endian and the compiler will insert the necessary endian-reversing instructions to the generated code to allow correct execution.

This Wiki page gives a high-level scoping of how this might be implemented for ARM (32-bit and 64-bit) in GCC.

High Level Scope

Implementation will take the following stages from a GCC engineering perspective:

  • Language Feature Design & ABI issues - ~2 man months

  • Initial implementation - ~12 man months
  • Developer Tools to validate correctness - ~3 months
  • Optimization and Improvement - ~6 man months

Language Feature Design & ABI issues

The article does not provide enough specification to work from, and so a proper language spec will need to be designed, along with a view to ABI issues (and maybe ACLE features).

An initial thought is to use C Address Spaces (see http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1005.pdf) to describe the little- and big- endian variable spaces.

Endianess should describe the storage format only, and reading a value from one endianess and storing it into another should produce a warning.

Initial Implementation

The initial implementation will be very basic, all loads/stores of the 'wrong'-endianess will include a data swizzle to convert all values to the 'right'-endianness.

There will be front-end changes and also every load/store pattern will need to be updated. Significant testing will be required.

Developer Tools To Validate Correctness

One of the issues with a bi-endian compiler will be making sure all data access are made using the correct endianess. Whilst the compiler can issue warnings within a translation unit, accesses made between translation units are harder to validate.

Investigations will need to be made in ways to handle this.

An initial thought is to encourage builds with debug information, and use the information stored in the DWARF information to determine whether data types are of the same type.

Optimization and Improvement

The initial implementation will cause a performance impact. However, there are opportunities to improve the code. For instance a = b | c is endianess neutral (if a, b, and c are all the same endianess) and so need no byte-reversing instructions, SIMD ops are also likely to be endianess neutral.

Investigations should be made to see how much benefit can be gained from this.

WorkingGroups/ToolChain/BiEndianCompiler (last modified 2013-01-24 14:16:47)