ARM32/64: perf: dwarf stack frame unwinding support

Introduction

Linux perf has architecture specific support code. x86 has some dwarf stack frame unwinding support whilst arm and arm64 do not. It should be implemented on ARM32/64.

The work for ARMv7 is done under LEG-760 blueprint. The ARMv8 support is implemented under LEG-815.

The expected result is the backtrace of user and kernel call chain in the perf output statistics.

Notes:

  1. Using the frame pointer to unwind the trace works as well on x86 and ARM but it depends on the compiler option -fno-omit-frame-pointer, which is disabled by default since GCC 4.6 for performance reasons; cf. gcc's manpage
  2. The application binaries and libraries need to be compiled with debug information (-g) in dwarf format in the .debug_frame section of the ELF binaries,

  3. -dbg flavor of the libraries is usually contain the correct debug information,

  4. the debug version of the libraries is found in the directory /usr/lib/debug/lib on the system. perf correctly uses this path to retrieve the debug flavor of the libraries.

  5. both libunwind and libdw are supported and provide the same ouput. However libdw is cleaner and more performant. The performance speedup over libdw is about 800% on ARMv7 with the stress_bt binary.

Expected Result

The example program consists of a long call chain (foo_1 calling foo_2 calling ... foo_128). foo_128 performs some calculation on u64 variables. The main loop calls foo_1, foo_2 ... foo_128 in order. The source code is found at the end of this page.

Note: because of the complexity and diversity of the call chain, perf cannot synthetize useful statistical information on the example program. The goal of the example program is to exercize the call chain unwinding info.

More examples follow in the Usage section below.

Example without dwarf unwindind information:

# perf record -- ./stress_bt
# perf report
    98.34%  stress_bt  stress_bt               [.] foo_128             
     0.11%  stress_bt  stress_bt               [.] foo_127                    
     0.10%  stress_bt  libc-2.17-2013.07-2.so  [.] random                     
     0.08%  stress_bt  stress_bt               [.] foo_93                     
     0.07%  stress_bt  stress_bt               [.] foo_89                     
     0.07%  stress_bt  stress_bt               [.] foo_108                    
     0.06%  stress_bt  stress_bt               [.] foo_116                    
     0.06%  stress_bt  stress_bt               [.] foo_123                    
     0.05%  stress_bt  stress_bt               [.] foo_85                     
     0.04%  stress_bt  stress_bt               [.] foo_59                     
     0.04%  stress_bt  stress_bt               [.] foo_82                     
     0.04%  stress_bt  stress_bt               [.] foo_74                     
     ...
     0.01%  stress_bt  [kernel.kallsyms]       [k] unmap_single_vma           
     0.01%  stress_bt  [kernel.kallsyms]       [k] unmapped_area_topdown      
     0.01%  stress_bt  stress_bt               [.] foo_94                     
     0.01%  stress_bt  stress_bt               [.] foo_28                     
     0.01%  stress_bt  stress_bt               [.] foo_49                     
     0.01%  stress_bt  stress_bt               [.] foo_62                     
     0.01%  stress_bt  stress_bt               [.] foo_65                     
     0.01%  stress_bt  [kernel.kallsyms]       [k] __do_fault                 
     0.01%  stress_bt  [kernel.kallsyms]       [k] __sync_icache_dcache       
     0.01%  stress_bt  [kernel.kallsyms]       [k] perf_event_aux             
     0.00%  stress_bt  [kernel.kallsyms]       [k] el0_svc_naked              
     0.00%  stress_bt  [kernel.kallsyms]       [k] do_page_fault              
     0.00%  stress_bt  [kernel.kallsyms]       [k] __inc_zone_state           
     0.00%  stress_bt  [kernel.kallsyms]       [k] handle_mm_fault            
     0.00%  stress_bt  [kernel.kallsyms]       [k] perf_event_aux_ctx         
     0.00%  stress_bt  [kernel.kallsyms]       [k] finish_task_switch         
     0.00%  stress_bt  [kernel.kallsyms]       [k] strlcpy                    

Example with dwarf unwinding information:

#perf record --call-graph dwarf -- ./stress_bt
#perf report (--call-graph --stdio)
    96.93%  stress_bt  stress_bt               [.] foo_128                    
            |
            --- foo_128
               |          
               |--98.22%-- foo_127
               |          |          
               |          |--99.46%-- foo_126
               |          |          |          
               |          |          |--99.11%-- foo_125
...
               |          |          |          
               |          |           --0.89%-- bar
               |          |                     doit
               |          |                     main
               |          |                     __libc_start_main
               |          |          
               |           --0.54%-- bar
               |                     doit
               |                     main
               |                     __libc_start_main
               |          
               |--0.77%-- bar
               |          doit
               |          main
               |          __libc_start_main
                --1.01%-- [...]

     0.25%  stress_bt  [kernel.kallsyms]       [k] page_mkclean               
            |
            --- page_mkclean
                clear_page_dirty_for_io
                write_cache_pages
                nfs_writepages
                do_writepages
                __filemap_fdatawrite_range
                filemap_write_and_wait_range
                nfs_file_fsync
                vfs_fsync
                nfs_file_flush
                filp_close
                put_files_struct
                exit_files
                do_exit
                do_group_exit
                __wake_up_parent
                ret_fast_syscall

...

     0.14%  stress_bt  stress_bt               [.] foo_127                    
            |
            --- foo_127
                foo_126
                foo_125
                foo_124
                foo_123
                foo_122
                foo_121
                foo_120
               |          
               |--66.66%-- foo_119
               |          foo_118
               |          foo_117
               |          foo_116
               |          foo_115
               |          foo_114
               |          foo_113
               |          foo_112
               |          foo_111
               |          |          
               |          |--50.00%-- foo_110
               |          |          foo_109
               |          |          foo_108
...
               |          |          foo_25
               |          |          foo_24
               |          |          bar
               |          |          doit
               |          |          main
               |          |          __libc_start_main
               |          |          
               |           --50.00%-- bar
               |                     doit
               |                     main
               |                     __libc_start_main
               |          
                --33.34%-- bar
                          doit
                          main
                          __libc_start_main

System setup

  • ARMv8 Foundation model - 1-4 CPUs
  • Marvell ArmadaXP 370 - Quad core ARMv7
  • OMAP4 Pandaboard - Dual core ARMv7
  • Mainline and Linaro kernels - 3.11-3.15
  • Ubuntu 13.07/08 distro

Kernel patches

The kernel patches to tools/perf are necessary to add the feature on ARM. The patches apply to the mainline 3.11-3.15 kernel tree and linux-linaro-tracking kernel tree, linux-linaro-core-tracking branch.

ARMv7

Initial version of kernel patches and discussion at http://www.spinics.net/lists/kernel/msg1597022.html.

Second version after rework at http://www.spinics.net/lists/kernel/msg1598951.html.

Third (final?) version after addition of libunwind library check at http://www.spinics.net/lists/kernel/msg1608919.html. The patches are on their way to mainline via Will D.'s ARM tree, cf. https://git.kernel.org/cgit/linux/kernel/git/will/linux.git/?h=perf/updates.

Update: the patches are mainlined!!

ARMv8

- The ARMv8 patches have been sent to LKML for review and inclusion into mainline, cf. http://www.spinics.net/lists/arm-kernel/msg281077.html, - The foundation emulator is building and running the perf code (kernel + user space perf + libraries) OK, - The testing is OK on the foundation emulator.

Update: the patches are mainlined!!

Dependency

  1. If not present, libunwind will not be linked and so not used by the perf tool,
  2. if present, libunwind >= 1.1 is needed to prevent a segfault when parsing the dwarf info,

  3. libunwind needs to be configured with --enable-debug-frame to prevent a linkage error. Note: --enable-debug-frame is automatically selected on ARM32, not on ARM64.
  4. libdw >= 0.158 is needed. The ARMv7 code is in the master branch; the ARMv8 code is in a WIP branch mjw/aarch64-unwind at https://git.fedorahosted.org/cgit/elfutils.git/.

Compilation

libunwind

Download:

# git clone git://git.sv.gnu.org/libunwind.git
# cd libunwind

Configuration, compilation and installation:

libunwind depends on the following packages: gcc gawk automake libtool texlive-extra-utils elfutils libdw1

# autoreconf -i
# ./configure --enable-debug-frame
# ./configure CFLAGS="-U_FORTIFY_SOURCE -DDEBUG -g" --enable-maintainer-mode --enable-debug --enable-debug-frame
# make -j`getconf _NPROCESSORS_ONLN`
# make install prefix=...

Run the built-in libunwind test suite:

# make check CFLAGS="-g" UNW_ARM_UNWIND_METHOD=1

libdw (part of elfutils)

Download:

# git clone https://git.fedorahosted.org/git/elfutils.git
# cd elfutils

Configuration, compilation and installation:

libdw depends on the following packages: gcc gawk automake libtool texlive-extra-utils elfutils libdw1

# autoreconf -f -i
# ./configure --enable-maintainer-mode
# make -j`getconf _NPROCESSORS_ONLN`
# make install

Kernel

The perf tool compilation depends on flex bison libelf-dev libunwind8-dev libaudit-dev libdw-dev binutils-dev zlib1g-dev man

Compilation:

# make -C tools/perf

Note: the compilation of perf requires some libraries (e.g. libelf) to be present, which can be difficult to achieve in the case of a cross-compilation. Native compilation or cross-compilation using bitbake (openembedded, arago...) is recommended.

Usage

Prepare the tracing (as root).

The entry /proc/sys/kernel/perf_event_paranoid controls the permissions on profiling events:

  • -1 - Not paranoid at all
  • 0 - Disallow raw tracepoint access for unpriv
  • 1 - Disallow cpu events for unpriv
  • 2 - Disallow kernel profiling for unpriv

# echo -1 > /proc/sys/kernel/perf_event_paranoid
# echo 0 > /proc/sys/kernel/kptr_restrict

Record the samples. The -F options changes the sampling frequency (in Hz); the default value is 1000.

Notes:

  1. the generated perf.data file can be big (~50MB for the stress_bt binary) and CPU-intensive to parse by perf report. Using a lower value in -F controls the output size,

  2. perf reports statistics about the traced application. The sampling frequency and duration of execution of the traced application has some influence on the reported data.
  3. The stress application can be used to generate load on the CPU, VM etc.

# perf record --call-graph dwarf -- <binary>
# perf record --call-graph dwarf -F 250 -- <binary>
# perf record --call-graph dwarf -- stress --cpu 2 --io 2 --vm 2 --timeout 5s
stress: info: [1893] dispatching hogs: 2 cpu, 2 io, 2 vm, 0 hdd
stress: info: [1893] successful run completed in 6s
[ perf record: Woken up 27 times to write data ]
[ perf record: Captured and wrote 7.185 MB perf.data (~313939 samples) ]
Warning:
Processed 950 events and lost 5 chunks!

Check IO/CPU overload!

Report the tracing statistics:

# perf report
# perf report --sort symbol --call-graph --stdio
# ========
# captured on: Tue Sep 10 07:08:23 2013
# hostname : localhost.localdomain
# os release : 3.11.0-rc4+
# perf version : 3.11.rc7.ge66518
# arch : armv7l
# nrcpus online : 2
# nrcpus avail : 2
# cpudesc : (null)
# total memory : 0 kB
# cmdline : 
# event : name = cpu-clock, type = 1, config = 0x0, config1 = 0x0, config2 = 0x0
# HEADER_CPU_TOPOLOGY info available, use -I to display
# pmu mappings: not available
# ========
#
# Samples: 919  of event 'cpu-clock'
# Event count (approx.): 229750000
#
# Overhead                                  Symbol
# ........  ......................................
#
     0.00%  [.] __random_r                        

     0.00%  [k] __memzero                         
            |          
            |--99.46%-- hogvm
            |          
             --0.54%-- __libc_do_syscall

     0.54%  [.] __random                          

     0.54%  [.] hogcpu                            

     0.54%  [.] 0x00000a7c                        
            |          
            |--92.11%-- 0x8a7c
            |          
             --7.89%-- 0x8a84

     7.89%  [k] get_page_from_freelist            
            |
            --- __alloc_pages_nodemask
                handle_pte_fault
                handle_mm_fault
                do_page_fault
                do_DataAbort
                __dabt_usr
                hogvm
...
    50.00%  [k] run_timer_softirq                 
            |
            --- __do_softirq
                irq_exit
                handle_IRQ
                gic_handle_irq
                __irq_usr
                hogcpu
...

perf built-in dwarf unwinding test

perf includes a built-in test for various features. The command perf test list provides the list of tests that can be executed.

Running the perf dwarf unwinding test allows to check if the feature effectively works on the current platform.

Test run example on ARMv7:

./tools/perf/perf test 23
23: Test dwarf unwind                                      : Ok

Test script

The test script test_perf_unwind.sh tests the presence of the dwarf unwinding capability of the perf binary.

After testing the presence of the binary and the basic setup, perf record -g dwarf and perf report are run and checked for errors.

The script returns 0 if the test passes correctly, 1 otherwise. The log file log.txt contains the full log.

The source code is found at the end of this page.

Speed improvement

The perf/core_unwind_speedup branch of git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git implements an optimization of the unwinding code, especially for libunwind.

The results for unwind_speedup (v4) on ARMv7 are:

  • libunwind: between -17% in execution time for light load (i.e. using not-so-deep backtraces from the stress app.) and -25% for deep backtrace (the stress_bt app.),

  • libdw: no significant improvement (0-3% improvement).

The results for unwind_speedup (v3) on ARMv7 are:

  • libunwind: between -29% in execution time for light load (i.e. using not-so-deep backtraces from the stress app.) and -49% for deep backtrace (the stress_bt app.),

  • libdw: no significant improvement (0-2% improvement).

Note: v3 is faster than v4 by 13-25%, with and without the speed-up. The real cause has been investigated, yet.

Here are the details:

unwind_speedup_v4:

./tools/perf/perf record -F 500 --call-graph dwarf -- stress --cpu 2 --io 2 --vm 2 --timeout 20s
stress: info: [3776] dispatching hogs: 2 cpu, 2 io, 2 vm, 0 hdd
stress: info: [3776] successful run completed in 21s
[ perf record: Woken up 1178 times to write data ]
[ perf record: Captured and wrote 308.711 MB perf.data (~13487793 samples) ]

time ./tools/perf/perf report --stdio > /dev/null 2>&1

libunwind
real    0m13.598s 0m13.731s 0m13.589s
user    0m8.720s 0m8.770s 0m8.450s
sys     0m4.860s 0m4.950s 0m5.120s

avg
real    13.64
user    8.65
sys     4.98

libunwind speedup
real    0m11.295s 0m11.525s 0m11.249s
user    0m8.080s 0m8.150s 0m8.000s
sys     0m3.200s 0m3.370s 0m3.230s

avg
real    11.36   -17%
user    8.08    -7%
sys     3.27    -34%

libdw
real    0m27.155s 0m27.466s 0m27.405s
user    0m16.460s 0m16.590s 0m16.690s
sys     0m10.560s 0m10.750s 0m10.590s

avg
real    27.34
user    16.58
sys     10.63

libdw speedup
real    0m27.128s 0m27.356s 0m27.208s
user    0m16.280s 0m16.110s 0m16.420s
sys     0m10.720s 0m11.130s 0m10.660s

avg
real    27.23   -0%
user    16.27   -2%
sys     10.84   +2%


./tools/perf/perf record -F 500 --call-graph dwarf -- ../../test_apps/stress_bt
Total count: 171711327751528502
[ perf record: Woken up 150 times to write data ]
[ perf record: Captured and wrote 37.456 MB perf.data (~1636467 samples) ]

libunwind
real    1m19.460s 1m19.842s 1m19.725s
user    0m51.920s 0m52.520s 0m52.230s
sys     0m27.440s 0m27.230s 0m27.410s

avg
real    79.68
user    52.22
sys     27.36

libunwind speedup
real    0m59.725s 0m59.551s 0m59.494s
user    0m45.950s 0m45.740s 0m46.390s
sys     0m13.720s 0m13.780s 0m13.080s

avg
real    59.59   -25%
user    46.02   -12%
sys     13.53   -50%

libdw
real    0m28.854s 0m28.746s 0m28.754s
user    0m27.450s 0m27.390s 0m27.460s
sys     0m1.390s 0m1.340s 0m1.280s

avg
real    28.78   
user    27.43
sys     1.34

libdw speedup
real    0m27.903s 0m27.887s 0m28.121s
user    0m26.730s 0m26.700s 0m26.950s
sys     0m1.150s 0m1.160s 0m1.140s

avg
real    27.97   -3%
user    26.79   -2%
sys     1.15    -14%

unwind_speedup_v3:

./tools/perf/perf record -F 500 --call-graph dwarf -- stress --cpu 2 --io 2 --vm 2 --timeout 20s
stress: info: [6512] dispatching hogs: 2 cpu, 2 io, 2 vm, 0 hdd
stress: info: [6512] successful run completed in 20s
[ perf record: Woken up 1194 times to write data ]
[ perf record: Captured and wrote 308.300 MB perf.data (~13469847 samples) ]

time ./tools/perf/perf report --stdio > /dev/null 2>&1

libunwind
real    0m11.883s 0m11.988s 0m11.896s
user    0m7.290s 0m7.110s 0m7.310s
sys     0m4.580s 0m4.860s 0m4.560s

avg
real    11.92
user    7.24
sys     4.67

libunwind.speedup
real    0m8.426s 0m8.404s 0m8.744s
user    0m5.660s 0m5.490s 0m5.390s
sys     0m2.750s 0m2.900s 0m3.340s

avg
real    8.52    -29%
user    5.51    -24%
sys     3       -36%

libdw
real    0m25.226s 0m25.274s 0m25.076s
user    0m14.080s 0m14.130s 0m14.250s
sys     0m11.010s 0m11.000s 0m10.680s

avg
real    25.192
user    14.15
sys     10.9

libdw.speedup
real    0m25.174s 0m25.191s 0m25.265s
user    0m14.350s 0m14.190s 0m14.400s
sys     0m10.680s 0m10.870s 0m10.720s

avg
real    25.21   -0%
user    14.31   -0%
sys     10.76   -0%


./tools/perf/perf record --call-graph dwarf -- ../../test_apps/stress_bt
Total count: 171711327751528502
[ perf record: Woken up 1239 times to write data ]
[ perf record: Captured and wrote 309.574 MB perf.data (~13525490 samples) ]

libunwind
real    8m6.572s 8m7.561s 8m6.917s
user    4m23.320s 4m25.310s 4m22.720s
sys     3m42.420s 3m41.470s 3m43.410s

avg
real    487
user    264
sys     222

libunwind.speedup
real    4m4.127s 4m5.623s 4m15.047s
user    2m17.000s 2m16.100s 2m17.690s
sys     1m47.100s 1m49.500s 1m57.330s

avg
real    248             -49%
user    137             -48%
sys     111             -50%

libdw
real    1m9.900s 1m9.986s 1m9.909s
user    0m59.210s 0m59.140s 0m59.120s
sys     0m10.580s 0m10.740s 0m10.680s

avg
real    70
user    59
sys     11

libdw.speedup
real    1m9.362s 1m9.555s 1m9.366s
user    0m58.690s 0m58.660s 0m57.940s
sys     0m10.570s 0m10.790s 0m11.320s

avg
real    69      -0%
user    58      -2%
sys     11      -0%

Debug & Tools

Use an alternate library

Using an alternate libunwind library, for debug:

# make DEBUG=1 LIBUNWIND_DIR=/opt/libunwind/ -C tools/perf

ELF & Dwarf tools

Dump the linked libraries:

# ldd ./stress_bt
        libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb6e2e000)
        /lib/ld-linux-armhf.so.3 (0xb6f1b000)

Dump the ELF sections:

# readelf -S ./stress_bt
# readelf -S ./stress_bt | grep debug_frame
  [31] .debug_frame      PROGBITS        00000000 0033c4 000e70 00      0   0  4

List the symbols from a binary:

# objdump -T (-t) /lib/arm-linux-gnueabihf/libc.so.6
...
000293f4  w   DF .text  0000007c  GLIBC_2.4   random_r
...

Dump the dwarf information from a binary. The output shows the symbols and the corresponding IP value.

# dwarfdump -f -kf stress_bt
.debug_frame

fde:
<    0><0x0000842c:0x00008498><foo_128><fde offset 0x00000010 length: 0x00000014><eh offset none>
        0x0000842c: <off cfa=00(r13) >
        0x0000842e: <off cfa=04(r13) > <off r14=-4(cfa) >
        0x00008430: <off cfa=24(r13) > <off r14=-4(cfa) >
<    0><0x00008498:0x000084a4><foo_127><fde offset 0x00000028 length: 0x00000014><eh offset none>
        0x00008498: <off cfa=00(r13) >
        0x0000849a: <off cfa=08(r13) > <off r3=-8(cfa) > <off r14=-4(cfa) >
...
<    0><0x00008ccc:0x00008cf2><main><fde offset 0x00000c40 length: 0x00000014><eh offset none>
        0x00008ccc: <off cfa=00(r13) >
        0x00008cce: <off cfa=04(r13) > <off r14=-4(cfa) >
        0x00008cd0: <off cfa=16(r13) > <off r14=-4(cfa) >

cie:
<    0> version                         1
        cie section offset              0 0x00000000
        augmentation
        code_alignment_factor           2
        data_alignment_factor           -4
        return_address_register         14
        bytes of initial instructions   3
        cie length                      12
        initial instructions
         0 DW_CFA_def_cfa r13 0

elfutils/libdw tools

Extract stack info from a running process or a core file:

elfutils# LD_LIBRARY_PATH=backends:libelf:libdw src/stack -p <binary_pid> --list-modules --build-id --verbose

Example:

elfutils# LD_LIBRARY_PATH=backends:libelf:libdw src/stack -p `pidof stress_bt` --list-modules --build-id --verbose
PID 7843 - process module memory map
0x0000000000400000-0x0000000000412000 stress_bt
  [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]
  /mnt/linaro/perf_libunwind/test_app/stress_bt
  -
0x0000007fb741f000-0x0000007fb7561000 libc-2.17-2013.07-2.so
  /lib/libc-2.17-2013.07-2.so
  /lib/.debug/libc-2.17-2013.07-2.so
0x0000007fb7565000-0x0000007fb7580000 ld-2.17-2013.07-2.so
  /lib/ld-2.17-2013.07-2.so
  /lib/.debug/ld-2.17-2013.07-2.so
0x0000007fb758e000-0x0000007fb7590000 [vdso: 7843]
0x0000007fb7590000-0x0000007fb7593000 ld-2.17-2013.07-2.so
  /lib/ld-2.17-2013.07-2.so
  /lib/.debug/ld-2.17-2013.07-2.so
PID 7843 - process
TID 7843:
#0  0x00000000004005dc     foo_128 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x5dc
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:232
#1  0x0000000000400640 - 1 foo_127 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x63f
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:246
#2  0x0000000000400654 - 1 foo_126 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x653
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:246
#3  0x0000000000400668 - 1 foo_125 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x667
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:246
#4  0x000000000040067c - 1 foo_124 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x67b
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:247
#5  0x0000000000400690 - 1 foo_123 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x68f
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:247
#6  0x00000000004006a4 - 1 foo_122 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x6a3
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:247
#7  0x00000000004006b8 - 1 foo_121 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x6b7
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:247
#8  0x00000000004006cc - 1 foo_120 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x6cb
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:248
#9  0x00000000004006e0 - 1 foo_119 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x6df
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:248
#10 0x00000000004006f4 - 1 foo_118 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x6f3
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:248
#11 0x0000000000400708 - 1 foo_117 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x707
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:248
#12 0x000000000040071c - 1 foo_116 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x71b
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:249
#13 0x0000000000400730 - 1 foo_115 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x72f
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:249
#14 0x0000000000400744 - 1 foo_114 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x743
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:249
#15 0x0000000000400758 - 1 foo_113 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x757
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:249
#16 0x000000000040076c - 1 foo_112 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x76b
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:250
#17 0x0000000000400780 - 1 foo_111 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x77f
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:250
#18 0x0000000000400794 - 1 foo_110 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x793
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:250
#19 0x00000000004007a8 - 1 foo_109 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x7a7
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:250
#20 0x00000000004007bc - 1 foo_108 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x7bb
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:251
#21 0x00000000004007d0 - 1 foo_107 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x7cf
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:251
#22 0x00000000004007e4 - 1 foo_106 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x7e3
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:251
#23 0x00000000004007f8 - 1 foo_105 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x7f7
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:251
#24 0x000000000040080c - 1 foo_104 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x80b
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:252
#25 0x0000000000400820 - 1 foo_103 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x81f
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:252
#26 0x0000000000400834 - 1 foo_102 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x833
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:252
#27 0x0000000000400848 - 1 foo_101 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x847
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:252
#28 0x000000000040085c - 1 foo_100 - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x85b
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:253
#29 0x00000000004011c0 - 1 bar - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x11bf
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:383
#30 0x0000000000401274 - 1 doit - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x1273
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:421
#31 0x00000000004012b0 - 1 main - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x12af
    /mnt/linaro/perf_libunwind/test_app/stress_bt.c:432
#32 0x0000007fb743f944 - 1 __libc_start_main - /lib/libc-2.17-2013.07-2.so
    /usr/src/debug/eglibc/2.17-r5/eglibc-linaro-2.17-2013.07-2/csu/libc-start.c:258
#33 0x000000000040047c - 1 $x - /mnt/linaro/perf_libunwind/test_app/stress_bt
    [6b3437517fdd3d4faed0ca45f04698c7f8ee39cf]@0x400000+0x47b
src/stack: dwfl_thread_getframes tid 7843 at 0x40047b in /mnt/linaro/perf_libunwind/test_app/stress_bt: no matching address range

ToDo

  1. Write test procedures and documentation: done, on-going task,
  2. Implement the feature on ARMv8, using the foundation model: done,
  3. integrate libdw on both ARMv7 (done) and ARMv8 (on-going),
  4. Investigate the compat mode (i.e. profiling an ARMv7 app running on an ARMv8 platform): on-going. libunwind does not provide the option out of the box unless perf is linked with multiple libunwind libraries (arm, aarch64). libdw supports it in principle (ok on i386/x86_64), to be tested on ARMv8.
  5. Follow-up the patches for inclusion in the mainline and linaro kernels.

Source code

Backtrace stress application

stress_bt.tar.gz

Test script

test_perf_unwind.sh

Fosdem 2015 presentation slides

Fosdem 2015 perf status on ARM and ARM64.pdf

Fosdem 2015 presentation at https://fosdem.org/2015/schedule/event/arm_perf/

perf unwinding feature at http://lwn.net/Articles/499116/.

libunwind info at https://wiki.linaro.org/KenWerner/Sandbox/libunwind.

libunwind git tree at http://git.savannah.gnu.org/gitweb/?p=libunwind.git.

LEG/Engineering/TOOLS/perf-callstack-unwinding (last modified 2015-02-02 09:35:43)