UPDATE: An updated version of the benchmarks comparing Ubuntu precise armhf/armel can be found here.

Summary

A short description on what the "hard-float" ABI is can be found here. This document is meant to serve as an indication/proof that the use of the hardfloat ABI can lead to significant performance benefits compared to the softfp ABI (such as currently used in Ubuntu armel).

The benchmarks are meant to be used as an indication, it's likely that you might get different results (but we don't really expect a major difference, certainly not one that would change the benefit).

We have comprised a list of applications which we have tested on two systems, Ubuntu/armel 11.10 "Oneiric" (which is using the softfp ABI), and Debian/armhf "unstable" (this uses the hardfloat ABI). The list is not final, we will continue to add applications to the list. The applications have been tested and run on a number of platforms, for now we have started with Genesi's EfikaMX (which uses the Freescale i.MX51@800Mhz), Texas Instruments Pandaboard (OMAP4@1Ghz by TI), but we will also try to include other platforms like the Freescale i.MX53 Quickstart board, the Beagleboard (OMAP3@600Mhz by TI) and perhaps some Tegra2 devices (like the Trimslice and the AC100).

Applications

Application

Version

Comment

povray

(3.6.1)

Extremely heavy on floating point, highly recursive, extreme example in favour of hardfloat ABI

mplayer2

2.0-134-g84d8671-8

Video decoding is not really using fp, but audio decoding does

On top of specific applications, we will also run the Phoronix Test Suite, a system to measure the performance of a system in a controlled and well-defined method.

Generic Benchmarks

povray chess2.pov (less is better)

Platform

CPU

Frequency

Cores

softfp (mm:ss)

hardfloat (mm:ss)

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

18:20 (1100s)

5:23 (323)

3.400x

Genesi EfikaMX2

Freescale i.MX53 (Cortex-A8)

1Ghz

1

-

4:50

Pandaboard

TI Omap4 (Cortex-A8)

1Ghz

2

13:22 (802s)

1:48 (108s)

7.420x

hardfloat is the clear winner here, no comments necessary :)

povray benchmark.ini +Ibenchmark.pov (128x72) (less is better)

Platform

CPU

Frequency

Cores

softfp (hh:mm:ss)

hardfloat (hh:mm:ss)

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

05:22:37 (19357s)

01:16:46 (4606s)

4.203x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

03:57:10 (14230s)

00:14:59 (899s)

15.829x

mplayer2 -vo null -nosound -benchmark Sintel_Trailer1.480p.DivX_Plus_HD.mkv (multithreaded)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

2

44.919s

41.865s

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

15.784s

15.680s

mplayer2 -vo null -nosound -benchmark Sintel_Trailer1.720p.DivX_Plus_HD.mkv (multithreaded)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

91.512s

88.160s

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

33.917s

32.928s

There is a slight difference in favour of hardfp on CortexA8 but that difference disappears on CortexA9. We suspect the difference in A8 to be caused by another factor and not the change in ABI.

Phoronix Test Suite Benchmarks

Ray-Tracing/C-Ray (less is better) (multithreaded)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

13837.70s

13497.27s

1.025x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

1340.03s

1334.06s

1.004x

Slight gain on Cortex-A8 (2.5%), nothing to shout about.

Graphs for C-Ray

Ray-Tracing/SmallPT (less is better) (multithreaded)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

18074s

20015s

0.903x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

2877s

3030s

0.949x

Loss of 5-10% in performance, Needs investigating.

Graphs for SmallPT

Processor/CLOMP (multithreaded)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

0.99

0.99

1.000x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

2.05

2.04

0.995x

No discernible difference.

Graphs for CLOMP

Image Conversion/DCRaw (multithreaded)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

313.81s

346.47s

0.905x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

187.72s

183.81s

1.021x

On the i.MX51 there is an unusual increase in time for armhf -or an unusual decrease in time for armel, though unlikely. Will investigate.

Graphs for dcraw

Audio-encoding/ape (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

161.27s

158.71s

1.0161x

Pandaboard

TI Omap4 (Cortex-A8)

1Ghz

2

115.18s

114.75s

1.004x

Not much of a gain, no surprise here, mp3 encoding is mostly integer arithmetic.

Graphs for Audio-encoding/ape

Audio-encoding/flac (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

139.76s

133.13s

1.050x

Pandaboard

TI Omap4 (Cortex-A8)

1Ghz

2

59.36s

56.68s

1.0473x

About 5% gain in favour of hardfloat.

Graphs for Audio-encoding/flac

Audio-encoding/mp3 (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

676.63s

662.96s

1.020x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

168.66s

167.62s

1.006x

Not much of a gain, no surprise here, mp3 encoding is mostly integer arithmetic. The slight difference is probably due to another factor.

Graphs for Audio-encoding/mp3

Audio-encoding/ogg (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

333.41s

331.20s

1.007x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

82.26s

81.58s

1.008x

Same applies.

Audio-encoding/wavpack (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

FAIL

140.68s

-

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

87.58s

87.67s

0.999x

Same applies.

Graphs of Audio-encoding/wavpack

Vidio-encoding/ffmpeg (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

380.26s

365.72s

1.040x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

200.72s

204.10s

0.983x

Graphs for Video-encoding/ffmpeg

Graphics/GraphicsMagick-HWB Color space (more is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

5

5

1.000x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

22

21

1.048x

Graphics/GraphicsMagick-Blur (more is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

2

2

1.000x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

9

9

1.000x

Graphics/GraphicsMagick-Local Adaptive Thresholding (more is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

1

1

1.000x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

4

4

1.000x

Graphics/GraphicsMagick-Resizing (more is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

3

3

1.000x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

15

14

0.933x

Graphics/GraphicsMagick-Sharpening (more is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

1

1

1.000x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

8

7

0.875x

armhf seems to lose in this case, but it's unclear what's the cause for that. Pretty certain that it doesn't have to do with the ABI change.

Graphs for GraphicsMagick

Processor/Himeno (more is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

18.95 MFLOPS

18.68 MFLOPS

0.986x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

28.85 MFLOPS

28.39 MFLOPS

0.984x

Nothing worth of comment here, hardfloat did not give us any benefits. The loss of performance is most likely not related to the ABI change.

Graphs of Himeno

Processor/MAFFT (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

746.10

733.39

1.017x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

176.23s

175.39s

1.005x

Very sight gain (0.5-1.7% in favour of hardfloat, probably unrelated.

Graphs of MAFFT

Processor/N-Queens (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

1152.66s

1141.09s

1.010x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

1005.51s

1005.86s

0.999x

No significant difference.

Processor/Sudokut (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

434.25s

469.47s

0.925x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

207.84s

220.02s

0.945x

Gain ~5-7% in favour of armel here, no idea why.

Graphs for Sudokut

Cryptography/openssl (more is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

2.50 SPS

2.50 SPS

1.000x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

3.90 SPS

3.90 SPS

1.000x

OpenSSL does not use floating point at all, so no difference.

Graphs for Cryptography/OpenSSL

Processor/PyBench (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

38062 ms

36566 ms

1.040x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

15390 ms

18683 ms

0.824x

Mixed impressions here, on the A8 there is a 4% increase but on the A9 there is a 18% drop in performance!! Definitely needs investigating.

Graphs for PyBench

Processor/Sample-Pi-Program (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51 (Cortex-A8)

800Mhz

1

145.24s

132.00s

1.100x

Pandaboard

TI Omap4 (Cortex-A9)

1Ghz

2

35.26s

34.00s

1.037x

3-10% gain in favour of hardfloat.

Graphs for Sample-Pi-Program

GUI toolkits/gtkperf [GtkComboBox] (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

348.50

398.96s

0.873x

Pandaboard

TI Omap4

1Ghz

2

300.12s

171.66s

1.748x

GUI toolkits/gtkperf [GtkComboBoxEntry] (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

17.31s

261.66s

0.066x

Pandaboard

TI Omap4

1Ghz

2

GUI toolkits/gtkperf [GtkToggleButton] (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

292.58s

66.25s

4.416x

Pandaboard

TI Omap4

1Ghz

2

GUI toolkits/gtkperf [GtkCheckButton] (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

78.23s

57.49s

1.361x

Pandaboard

TI Omap4

1Ghz

2

GUI toolkits/gtkperf [GtkRadioButton] (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

140.43s

85.27s

1.647x

Pandaboard

TI Omap4

1Ghz

2

75.00s

40.45s

1.854x

GUI toolkits/gtkperf [GtkTextView - AddText] (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

12210.22s

4690.37s

2.603x

Pandaboard

TI Omap4

1Ghz

2

GUI toolkits/gtkperf [GtkTextView - Scroll] (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

1.52s

3.40s

0.447x

|| Pandaboard || TI Omap4|| 1Ghz || 2 || || || ||

GUI toolkits/gtkperf [GtkDrawingArea - Circles] (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

443.13s

485.36s

0.913x

Pandaboard

TI Omap4

1Ghz

2

GUI toolkits/gtkperf [GtkDrawingArea - Pixbufs] (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

24.01s

22.82s

1.052x

Pandaboard

TI Omap4

1Ghz

2

15.31s

10.26s

1.492x

GUI toolkits/gtkperf [TotalTime] (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

14951.75s

7673.97s

1.948x

Pandaboard

TI Omap4

1Ghz

2

This is a weird mix of results, in general hardfloat is really faster but in some cases it gets trounced by softfloat. Since we're talking about two different environments (Debian +gnome2, Ubuntu+Unity), but both running the fbdev X.org driver, there may be a case that the Ubuntu desktop is slightly more optimized in some areas -which is also an issue that would be worth backporting to Debian armhf! :) PENDING: Complete and updated pandaboard gtkperf benchmarks. Graphs for gtkperf

GMPbench (more is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

test fail

615.61 score

Pandaboard

TI Omap4

1Ghz

2

failed

1008.20 score

Graphs for GMPbench

SciMark2/Composite (more is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

22.60 MFLOPS

24.76 MFLOPS

1.096x

Pandaboard

TI Omap4

1Ghz

2

45.28 MFLOPS

49.06 MFLOPS

1.083x

SciMark2/Fast Fourier Transform (more is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

13.35 MFLOPS

14.11 MFLOPS

1.057x

Pandaboard

TI Omap4

1Ghz

2

13.07 MFLOPS

13.34 MFLOPS

1.021x

SciMark2/Jacobi (more is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

29.92 MFLOPS

31.99 MFLOPS

1.069x

Pandaboard

TI Omap4

1Ghz

2

100.39 MFLOPS

100.39 MFLOPS

1.000x

SciMark2/Monte Carlo (more is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

13.41 MFLOPS

18.63 MFLOPS

1.389x

Pandaboard

TI Omap4

1Ghz

2

48.81 MFLOPS

67.11 MFLOPS

1.375x

SciMark2/Sparse Matrix Multiply (more is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

39.29 MFLOPS

36.97 MFLOPS

0.940x

Pandaboard

TI Omap4

1Ghz

2

28.29 MFLOPS

28.41 MFLOPS

1.004x

SciMark2/LU Matrix Factorization (more is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

20.01 MFLOPS

22.09 MFLOPS

1.104x

Pandaboard

TI Omap4

1Ghz

2

35.83 MFLOPS

36.06 MFLOPS

1.006x

Graphs for SciMark2

Compression/7-zip (more is better) (multithreaded)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

FAILED

313 MIPS

-

Pandaboard

TI Omap4

1Ghz

2

763 MIPS

771 MIPS

1.010x

No difference, no FP code is expected to be used in compression. For some reason the test failed on efikamx/armel. However it's interesting to see that the difference between A8/A9 is because of the two cores and the higher frequency in the OMAP4. In fact, getting the normalized performance of the A9 at 800Mhz, it's 308.40 MIPS which is actually ~98.5% of the performance of the A8, which is pretty good use of SMP!!

Graphics for 7-zip

Compression/pbzip2 (less is better) (multithreaded)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

FAILED

606.67s

-

Pandaboard

TI Omap4

1Ghz

2

204.32s

203.73s

1.003x

Again the test failed on efikamx/armel, will investigate and update the page with the proper value. In any case, the difference is insignificant, which is expected. Again the difference between A8 and A9 is due to the difference in number of cores and frequency. Normalized

Graphs for pbzip2

Compression/lzma (less is better)

Platform

CPU

Frequency

Cores

softfp

hardfloat

Relative performance (hardfloat/softfp)

Genesi EfikaMX

Freescale i.MX51

800Mhz

1

1617.13s

1568.44s

1.031x

Pandaboard

TI Omap4

1Ghz

2

1011.89s

1012.33s

0.999x

Graphs for lzma


CategorySpec

OfficeofCTO/HardFloat/Benchmarks (last modified 2012-05-17 09:51:33)