RFC: Restricted Far Jumps

Restricted Far Jumps (RFJ) is a proposed security feature that is intended to reduce the potential impact of Return Oriented Programming (ROP) attacks. The purpose of this page is to explain the idea and elicit comments and suggestions from the readers.

The core idea is, as ROP attacks have very different execution patterns from conventional code, that if we could distinguish between the two, it would be possible to allow only the conventional code execution to proceed.

Conventional execution

On a typical Linux/ELF system, a program is a concatenation of object files whose entry points are well defined. Inside an object file, execution can jump around arbitrarily, but calls between object files use entry points that are declared in the metadata of the object file.

ROP execution

Return Oriented Programming attacks use so-called ROP chains to trick a process into performing tasks that it is not supposed to. These ROP chains consist of carefully crafted sequences of data values and return addresses that are loaded onto the execution stack by exploiting buffer overflow vulnerabilities. To stage a successful ROP attack, the original code executed by the program needs to be analyzed beforehand to find so-called gadgets, sequences of instructions at the end of functions that can be exploited.

For instance, consider the following two gadgets:

    mov r0, r4
    pop {pc}

and

    pop {r4, pc}

These are the final instructions of two arbitrary functions. By crafting a stack frame that contains (in ascending order) the address of the second gadget, the intended value for r0, the address of the first gadget, and the address of a function that takes a single argument in r0, execution of the ROP chain (which occurs when returning from the exploited function) results in the function to be called with a value chosen by the attacker appearing in r0. In a real-world attack, this could for instance be a call to execve() with r0 pointing to a buffer containing "/bin/sh".

Mitigation

In order to prevent these kinds of attacks, we would like to be able to distinguish between a jump into a designated entry point, and a jump into a gadget. One possible way to accomplish this is explained in the remainder of this document.

Aligned entry points

With a bit of help from the compiler and assembler, we could impose the following rules on generated code:

  • each function shall start at a 64-byte boundary;
  • each bl call shall be located right before a 64-byte boundary, so that lr % 64 == 0;

  • each function epilogue shall be placed such that the final manipulation of pc appears right before a 64-byte boundary.

The result of this operation is that allowable far jumps will always be targeted at 64 byte boundaries. If we manage to uphold this restriction, it should become significantly harder to find usable gadgets, as they will also need to appear at such a boundary to be executable.

Functions that do not manipulate the stack pointer at all, or executable fragments like PLT entries, will be placed in the .text section exactly like before. All other functions will receive the treatment described above, and will be placed in a dedicated .text.restricted section, whose base and offset inside the RX segment will be recorded in the ELF metadata using a GNU_REXEC header. This header will be used by the dynamic loader when setting the mapping permissions.

The resulting ELF layout would look something like

   segment     section             protection
+----------+--------------------+-------------+
|          |  .text.restricted  | restricted  |
|   RX     |  .rodata           |          RX |
| segment  +--------------------+-------------+
|          |  .text             | RX          |
|          |  .plt              |             |
+----------+--------------------+-------------+
|          |  .got              | RELRO       |
|   RW     +--------------------+-------------+
| segment  |  .data             | RW          |
|          |  .bss              |             |
+----------+--------------------+-------------+

Enforcing the alignment

In order to make sure the process can only execute far jumps to aligned targets, we need some help from the kernel.

Execution window

We shall define the "execution window" to be the set of executable pages within a process that are currently freely executable, i.e., a branch instruction may be targeted anywhere inside those pages and execution will continue unimpeded. The remaining executable regions are locked for execution and can only be unlocked by branching into them at an aligned offset. Whether the execution window will grow monotonically or will be pruned periodically to remove the regions that have not been used recently is TBD.

Using the XN bit to lock executable pages

On ARM (and most other modern architectures), the hardware has support for flagging pages as non-executable. Trying to jump into such a page will result in a trap that needs to be handled by the kernel. By setting this XN bit for all resident pages of an executable VMA that have not been unlocked yet, we will have the opportunity to inspect the branch target, and make sure it is 64 byte aligned, before allowing code to be executed from such a page. (Copy On Write [COW] is implemented in a similar fashion: read-write VMAs are mapped read-only, and the trap that occurs when writing to such a page results in the page to be cloned and the clone to be handed to the writing process)

The unlock operation will result in the page to be added to the execution window, and the least recently used page may be purged and its XN bit set again. (In case of multithreading, we will likely need some kind of victim list so pages can be quickly brought back if it turns out that other threads had been executing from them as well)

Consequences for ROP

Under such a regime, ROP attacks will be limited to gadgets whose size is a multiple of 64 bytes, and gadgets that are part of the execution window at the time of the attack. Whether this reduces the attack surface sufficiently to justify the performance penalty is TBD. More elaborate tricks might be possible:

  • deliberately put illegal instructions on 64 byte boundaries and let the unlocked code jump over them, (or, in the Thumb2 case, take care to place wide instructions astride 64 byte boundaries);
  • varying alignment size and execution window size;
  • selectively enable for some libraries (libc.so but not libx264.so for instance)

ardbiesheuvel/RestrictedFarJumps (last modified 2014-01-28 12:00:42)