Glibc ARM performance

SPEC2006

Time spent in glibc during a run with the 'train' data set:

    17.57%  libm-2.17.90.so                
     4.37%  libc-2.17.90.so                

libc

Breakdown of the 4.37% time from libc.so:

    33.23%  [.] _int_free                      
    25.71%  [.] malloc                         
    24.32%  [.] _int_malloc                    
     3.71%  [.] memcpy                         
     3.10%  [.] free                           
     2.64%  [.] strcmp                         
     2.41%  [.] malloc_consolidate             
     1.68%  [.] memset                         

Much of the time spent in malloc-family functions is reported in locking sequences such as this from malloc:

    0.70 :        76a68:       mov     r2, #1
    0.62 :        76a6c:       dmb     sy
   18.13 :        76a70:       ldrex   r3, [r4]
    0.00 :        76a74:       cmp     r3, #0
    0.00 :        76a78:       bne     76a88 <__libc_malloc+0x5c>
    1.24 :        76a7c:       strex   r0, r2, [r4]
    0.00 :        76a80:       cmp     r0, #0
    0.00 :        76a84:       bne     76a70 <__libc_malloc+0x44>
   12.24 :        76a88:       cmp     r3, #0
    0.01 :        76a8c:       dmb     sy

Further info is available in /GlibcMalloc

libm

Breakdown of the 17.57% time from libm.so:

    11.44%  [.] __powf_finite         
    11.36%  [.] fesetenv
    10.25%  [.] feholdexcept          
    10.16%  [.] __dubsin              
     9.93%  [.] fesetround            
     8.66%  [.] __sinl                
     8.07%  [.] __exp_finite          
     7.27%  [.] __cosl                
     6.23%  [.] feraiseexcept         
     4.30%  [.] __sqr                 
     3.07%  [.] __mul                 
     2.35%  [.] feupdateenv
     1.33%  [.] __fpclassify          
     1.02%  [.] __cexpl               
     0.94%  [.] __expf_finite         
     0.60%  [.] __powf                
     0.51%  [.] __sincosl             

The fe* calls are internal to libm in order for some functions to use the rounding/exception modes specified by C99 regardless of any user-set mode.

Possible improvements

  • ARMv8: use instructions with explicit rounding mode.
  • Avoid context save/restore in functions where VFP cannot cause an exception.

SPEC2000

     2.51%  libm-2.17.90.so            
     1.58%  libc-2.17.90.so            

libc

    34.24%  [.] memcpy                         
    16.86%  [.] memset                         
     8.86%  [.] _int_free                      
     8.41%  [.] _int_malloc                    
     6.34%  [.] _IO_getc                       
     6.27%  [.] malloc                         
     2.69%  [.] __ctype_b_loc                  
     1.63%  [.] free                           
     1.58%  [.] vfprintf                       
     1.57%  [.] __libc_calloc                  
     1.45%  [.] _IO_putc                       
     1.09%  [.] strcmp                         
     0.82%  [.] __GI_____strtod_l_internal     
     0.77%  [.] __printf_fp                    
     0.70%  [.] __GI_____strtol_l_internal     
     0.56%  [.] malloc_consolidate             
     0.54%  [.] _IO_vfscanf                    

libm

    21.43%  [.] __roundl              
    20.57%  [.] __sqrt_finite         
     7.83%  [.] __sinl                
     7.74%  [.] fesetenv
     7.49%  [.] feholdexcept          
     6.74%  [.] fesetround            
     6.59%  [.] __cosl                
     6.43%  [.] __floorf              
     4.86%  [.] feraiseexcept         
     2.67%  [.] __exp_finite          
     1.59%  [.] feupdateenv
     1.41%  [.] __acos_finite         
     0.94%  [.] __hypotf_finite       
     0.86%  [.] __sincosl             
     0.61%  [.] __hypotf              
     0.52%  [.] __pow_finite          

Missing assembly functions

The following functions have assembly versions for x86-64 but not for ARM.

Multiple precision arithmetic

__mpn_lshift

__mpn_mul_1

__mpn_rshift

Floating-point

__signbit

Not needed

__signbitf

copysign

copysignf

copysignl

Not applicable

finitel

scalbnl

Byte order

htonl

Not needed

ntohl

String/mem

bcopy

Should call memmove

bzero

Should call memset

memcmp

mempcpy

memrchr

stpncpy

strcasecmp

strcat

strchrnul

strcspn

strncasecmp

strncat

strncmp

strncpy

strnlen

strpbrk

strspn

strtok

strtok_r

Wide string

wcschr

wcscmp

wcscpy

wcslen

wcsrchr

wmemcmp

WorkingGroups/ToolChain/LibraryPerformance/GlibcPerformance (last modified 2013-05-29 08:39:44)