Summary

A short description on what the "hard-float" ABI is can be found here. This document is meant to serve as an indication/proof that the use of the hardfloat ABI can lead to significant performance benefits compared to the softfp ABI (such as currently used in Ubuntu armel).

The benchmarks are meant to be used as an indication, it's likely that you might get different results (but we don't really expect a major difference, certainly not one that would change the benefit).

We have comprised a list of applications which we have tested on two Ubuntu "Precise" 12.04 systems, armel and armhf using the softfp and armhf ABIs respectively. The list is not final, we will continue to add applications to the list. The applications have been tested and run on a dual-core Cortex A9 CPU at 1Ghz.

Applications

Application

Version

Comment

povray

(3.6.1)

Extremely heavy on floating point, highly recursive, extreme example in favour of hardfloat ABI

On top of specific applications, we will also run the Phoronix Test Suite, a system to measure the performance of a system in a controlled and well-defined method.

Generic Benchmarks

Benchmark

softfp

hardfloat

Relative performance (hardfloat/softfp)

Notes

povray chess2.pov (less is better)

13:19 (799s)

1:46 (106s)

7.537x

7x times faster

povray benchmark.ini +Ibenchmark.pov (128x128) (less is better)

05:46:18 (20778s)

00:22:25 (1345s)

15.448x

15 times faster!!!

Phoronix Test Suite Benchmarks

Benchmark

softfp

hardfloat

Relative performance (hardfloat/softfp)

Notes

Ray-Tracing/C-Ray (less is better) (multithreaded)

1227.95s

1226.98s

1.001x

No significant difference.

Ray-Tracing/POVRay (less is better) (multithreaded)

6553s

6494s

1.009x

Insignificant difference, have in mind that this is a different benchmark than the one above.

Ray-Tracing/SmallPT (less is better) (multithreaded)

2791s

2932s

0.951x

Loss of 5% in performance, Needs investigating.

Processor/CLOMP (multithreaded)

2.05

2.05

1.000x

No difference.

Image Conversion/DCRaw (multithreaded)

699.51s

698.50s

1.021x

No significant difference.

Audio-encoding/ape (less is better)

125.81s

125.945s

1.004x

No significant difference.

Audio-encoding/flac (less is better)

59.36s

56.36s

1.0473x

About 5% gain in favour of hardfloat.

Audio-encoding/mp3 (less is better)

127.90s

127.93s

1.000x

No significant difference.

Audio-encoding/ogg (less is better)

82.26s

81.58s

1.008x

No significant difference.

Audio-encoding/wavpack (less is better)

87.58s

87.67s

0.999x

No significant difference.

Video-encoding/ffmpeg (less is better)

236.20

204.10s

1,157x

15% gain in favour of hardfloat.

Graphics/GraphicsMagick-HWB Color space (more is better)

23

23

1.000x

No difference.

Graphics/GraphicsMagick-Blur (more is better)

12

12

1.000x

No difference.

Graphics/GraphicsMagick-Local Adaptive Thresholding (more is better)

4

4

1.000x

No difference.

Graphics/GraphicsMagick-Resizing (more is better)

16

17

0.941x

Slight loss of performance on armhf.

Graphics/GraphicsMagick-Sharpening (more is better)

8

8

1.000x

No difference.

Processor/Himeno (more is better)

71.34 MFLOPS

88.34 MFLOPS

1,238x

24% gain in favour of armhf

Processor/Hmmer (less is better)

264.00s

267.49s

0.987x

Slight loss for armhf

Processor/MAFFT (less is better)

111.30s

112.31s

0,991x

Slight loss for armhf

Processor/Minion [Test: Bibd] (less is better)

1743.50s

1720.75s

1.013x

No significant difference.

Processor/Minion [Test: Graceful] (less is better)

578.29s

579.12s

0.999x

No significant difference.

Processor/Minion [Test: Quasigroup] (less is better)

1381.32s

1422.72s

0.971x

hardfloat 3% slower

Processor/Minion [Test: Solitaire] (less is better)

1533.90s

1489.34s

1,030x

3% gain in favour hardfloat

Processor/N-Queens (less is better)

1030.09s

1030.22s

1,000

No significant difference.

Processor/NPB [Test: BT.A] (more is better)

325.17 Mop/s

325.31 Mop/s

1.000x

No significant difference.

Processor/NPB [Test: CG.B] (more is better)

45.67 Mop/s

46.21 Mop/s

1.012x

No significant difference.

Processor/NPB [Test: EP.B] (more is better)

8.34 Mop/s

8.28 Mop/s

0.993x

No significant difference.

Processor/NPB [Test: FT.B] (more is better)

FAIL: OOM

FAIL: OOM

-

Out of Memory error

Processor/NPB [Test: IS.C] (more is better)

0.45 Mop/s

0.41 Mop/s

0.911x

9% performance loss on hardfloat (*)

Processor/NPB [Test: LU.A] (more is better)

202.63 Mop/s

203.04 Mop/s

1.005x

No significant difference.

Processor/NPB [Test: MG.B] (more is better)

148.17 Mop/s

146.88 Mop/s

0.991x

No significant difference.

Processor/NPB [Test: SP.A] (more is better)

136.23 Mop/s

132.97 Mop/s

0.999x

No significant difference.

Processor/NPB [Test: UA.A] (more is better)

1.28 Mop/s

1.27 Mop/s

0.976x

<3% performance loss on hardfloat. (*)

Cryptography/openssl (more is better)

3.90 SPS

3.90 SPS

1.000x

No difference.

Processor/PyBench (less is better)

19464 ms

19516 ms

0.997x

No significant difference.

Processor/ramspeed [Type: Copy - Benchmark: Integer] (more is better)

331.01 MB/s

330.75 MB/s

0.999x

No significant difference.

Processor/ramspeed [Type: Copy - Benchmark: Floating Point] (more is better)

333.47 MB/s

333.32 MB/s

1.000x

No significant difference.

Processor/ramspeed [Type: Scale - Benchmark: Integer] (more is better)

334.21 MB/s

334.57 MB/s

1.001x

No significant difference.

Processor/ramspeed [Type: Scale - Benchmark: Floating Point] (more is better)

333.87 MB/s

334.44 MB/s

1.002x

No significant difference.

Processor/ramspeed [Type: Add - Benchmark: Integer] (more is better)

745.51 MB/s

745.01 MB/s

0.999x

No significant difference.

Processor/ramspeed [Type: Add - Benchmark: Floating Point] (more is better)

745.92 MB/s

744.46 MB/s

0.999x

No significant difference.

Processor/ramspeed [Type: Triad - Benchmark: Integer] (more is better)

743.04 MB/s

743.61 MB/s

1.001x

No significant difference.

Processor/ramspeed [Type: Triad - Benchmark: Floating Point] (more is better)

737.81 MB/s

738.42 MB/s

1.001x

No significant difference.

Processor/ramspeed [Type: Average - Benchmark: Integer] (more is better)

538.51 MB/s

538.99 MB/s

1.001x

No significant difference.

Processor/ramspeed [Type: Average - Benchmark: Floating Point] (more is better)

537.44 MB/s

538.28 MB/s

1.002x

No significant difference.

Processor/Sample-Pi-Program (less is better)

31.88s

31.26s

0,981x

2% loss for hardfloat.

Processor/Sudokut (less is better)

211.99

206.96s

1.024x

gain ~2.5% in favour of hardfloat

Processor/x264 (more is better)

3.91 FPS

3.91 FPS

0.945x

SciMark2/Composite (more is better)

45.57 MFLOPS

48.52 MFLOPS

1.065x

6.5% gain in favour of hardfloat

SciMark2/Fast Fourier Transform (more is better)

11.67 MFLOPS

13.00 MFLOPS

1.114x

11.4% gain in favour of hardfloat

SciMark2/Jacobi (more is better)

100.65 MFLOPS

100.39 MFLOPS

0.997x

No significant difference.

SciMark2/Monte Carlo (more is better)

50.41 MFLOPS

64.14 MFLOPS

1.272x

27.2% gain in favour of hardfloat

SciMark2/Sparse Matrix Multiply (more is better)

28.73 MFLOPS

28.80 MFLOPS

1.002x

No significant difference.

SciMark2/LU Matrix Factorization (more is better)

36.34 MFLOPS

36.29 MFLOPS

0.999x

No significant difference.

Compression/7-zip (more is better) (multithreaded)

770 MIPS

766 MIPS

0.995x

No significant difference.

Compression/pbzip2 (less is better) (multithreaded)

200.57s

200.96s

1.003x

No significant difference.

Compression/lzma (less is better)

1003.93s

998.92s

1.005x

No significant difference.

GUI toolkits/gtkperf [GtkComboBox] (less is better)

3.56s

3.68s

0.969x

No significant difference.

GUI toolkits/gtkperf [GtkComboBoxEntry] (less is better)

2.65s

2.71s

0.978x

No significant difference.

GUI toolkits/gtkperf [GtkToggleButton] (less is better)

0.62s

0.60s

1.037x

No significant difference.

GUI toolkits/gtkperf [GtkCheckButton] (less is better)

0.50s

0.43s

1.118x

12% gain for armhf

GUI toolkits/gtkperf [GtkRadioButton] (less is better)

0.88s

0.80s

1.110x

11% gain for armhf

GUI toolkits/gtkperf [GtkTextView - AddText] (less is better)

1.28s

1.27s

1.013x

No significant difference.

GUI toolkits/gtkperf [GtkTextView - Scroll] (less is better)

1.10s

1.17s

0.941x

No significant difference.

GUI toolkits/gtkperf [GtkDrawingArea - Circles] (less is better)

6.59s

6.70s

0.983x

No significant difference.

GUI toolkits/gtkperf [GtkDrawingArea - Pixbufs] (less is better)

0.39s

0.39s

0.992x

No significant difference.

GUI toolkits/gtkperf [TotalTime] (less is better)

25.21s

25.43s

0.991x

Small differences in the gtkperf tests, nothing really significant compared to the variance from test to test.

(*): Lots of swap was used during this benchmark


CategorySpec

OfficeofCTO/HardFloat/Benchmarks201205 (last modified 2012-06-14 20:53:22)