Using Perf on Android

Preparation

An Android device/emulator is required for using perf to profile your applications or libraries on Android. Also you will need to setup your host machine following http://source.android.com/source/initializing.html.

Build Android and Flash the Device

It is recommended to build the whole system from scratch including the kernel, so that you will have all the symbol files.

Please follow http://source.android.com/source/building.html to setup your host machine and build AOSP. If you are using a HiKey board, please also refer to https://github.com/96boards/documentation/wiki/HiKeyGettingStarted.

Build Perf

Make sure you've connected to your Android device.

Note: the below instructions requires an userdebug or eng build android system.

# setup your build environment first
# . build/envsetup.sh
# lunch
adb root
adb remount
m -j 32 perf
adb sync

Prepare Symbol Files

On Android, JAVA sources are compiled to dex bytecodes. Dex bytecodes will be compiled to oat files on the target or prebuilt on host. The oat files are also legal ELF files. By default, the oat files do not contains any ELF debug information and also they do not exist in $ANDROID_PRODUCT_OUT/symbols . But all oat files contains related information for de-optimization and exception purpose. So it is possible to re-generate the ELF debug information from oat files.

Please follow the below instructions to symbolize all oat files on your device.

bash art/tools/symbolize.sh

If you are using AOSP master, most oat files are ignored by the script. You can apply a patch to fix the issue and run the script again.

cd art
git fetch https://android.googlesource.com/platform/art refs/changes/71/147871/1 && git cherry-pick FETCH_HEAD

Run Java on Android

Sample Source

Main.java

   1 public class Main {
   2     public static void main(String[] args) {
   3         long startTime = System.nanoTime();
   4         System.out.println(sum(1000*1000*1000));
   5         System.out.println((System.nanoTime() - startTime) / 1000 / 1000);
   6     }
   7 
   8     static long sum(long count) {
   9         long res = 0;
  10         for (long i = 0; i <= count; ++i) {
  11             res += i;
  12         }
  13         return res;
  14     }
  15 }

Compile Java Source to Class File

javac -g src/Main.java -d class

Compile Class File to Dex file

dx --dex --debug --output=out/Main.jar class

Push to Target

adb push out/Main.jar /data/local/tmp/Main.jar

Run the Dex file

Additional compiler option "-Xcompiler-option -g" is used to tell dex2oat to generate oat files with debug information which can be used by perf.

adb shell dalvikvm32 -Xcompiler-option -g -cp /data/local/tmp/Main.jar Main
adb shell dalvikvm64 -Xcompiler-option -g -cp /data/local/tmp/Main.jar Main

Using Perf

To make the sample script simple, only commands for aarch64 are listed.

Perf Record

adb shell perf record -g -o /data/local/tmp/perf.data dalvikvm64 -Xcompiler-option -g -cp /data/local/tmp/Main.jar

Pull Perf Data

adb pull /data/local/tmp/perf.data perf.data

Pull Symbol file

"-Xcompiler-option -g" was used in previous command. The oat file we generated contains the debug information. So we can just pull the oat file without symbolizing it.

adb pull /data/dalvik-cache/arm64/data@local@tmp@Main.jar@classes.dex $ANDROID_PRODUCT_OUT/symbols/data/dalvik-cache/arm64/data@local@tmp@Main.jar@classes.dex

Perf Report

  • We haven't put path information in debug section so far.
  • aarch64-linux-android-objdump supports both aarch32 and aarch64.

  • --symfs $ANDROID_PRODUCT_OUT/symbols is for symbol files lookup.

  • -k <kernel_folder>/vmlinux is optional for understand the kernel symbols.

cp src/Main.java .
perf report --objdump=aarch64-linux-android-objdump -k <kernel_folder>/vmlinux --symfs $ANDROID_PRODUCT_OUT/symbols -i perf.data

Flame Graph

Flame Graph is an useful tool for visualize the perf result. It can be used to help people understand the behavior of an application.

Download the Flame Graph.

git clone https://github.com/brendangregg/FlameGraph.git

Generate the graph.

perf script -k <kernel_folder>/vmlinux --symfs $ANDROID_PRODUCT_OUT/symbols -i perf.data | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl > flamegraph.html

Perf Annotate

Perf annotate can be used to understand the detail of a method. Also you can do this via "perf report".

perf annotate --objdump=aarch64-linux-android-objdump -k <kernel_folder>/vmlinux --symfs $ANDROID_PRODUCT_OUT/symbols -i perf.data "long Main.sum(long)"

Frame Unwinding Support

With default AOSP master, you may only be able to unwind the stack according to frame pointer register for arm32 and arm64. This works fine for aarch64 code generated by GCC/CLANG. But it doesn't work well with oat files. In ART, we use frame pointer register as a normal callee-save register for better performance.

ART can generate CFI information in DWARF sections. So in theory, we should be able to unwind the stack if perf support dwarf stack unwinding. For more information, please refer to ARM32/64: perf: dwarf stack frame unwinding support

LMG/Engineering/UsingPerfOnAndroid (last modified 2015-06-18 05:32:32)