Overview

The oprofile utility allows users to profile the whole system, using the performance monitoring unit of the CPU. Oprofile needs a kernel module to access the hardware performance counters. By using sampling, the overhead of profiling is very low, but the results are also a bit rough. Letting oprofile run for a long enough time, good results are usually achieved easily.

Project Information

Installation Instructions

Available by default at the Linaro developer image, and also at the Overlay PPA. To install it at any Linaro image, just run:

sudo apt-get install oprofile

Using oprofile

Crash-course:

# opcontrol --no-vmlinux
# opcontrol --start
# # do things you want to profile
# opcontrol --stop
# opreport

For a more real world scenario, one would point the --vmlinux=/boot/.. to the debug symbols of the kernel being used. Usually everyone is interested in the CPU_CYCLES event, but oprofile allows you to sample any of the events the cpu supports. For example, the common armv7a events are:

PMNC_SW_INCR : Software increment of PMNC registers 
IFETCH_MISS : Instruction fetch misses from cache or normal cacheable memory
ITLB_MISS : Instruction fetch misses from TLB
DCACHE_REFILL : Data R/W operation that causes a refill from cache or normal cacheable memory
DCACHE_ACCESS : Data R/W from cache
DTLB_REFILL : Data R/W that causes a TLB refill
DREAD : Data read architecturally executed (note: architecturally executed = for instructions that are unconditional or that pass the condition code)
DWRITE : Data write architecturally executed
INSTR_EXECUTED : All executed instructions
EXC_TAKEN : Exception taken
EXC_EXECUTED : Exception return architecturally executed
CID_WRITE : Instruction that writes to the Context ID Register architecturally executed
PC_WRITE : SW change of PC, architecturally executed (not by exceptions)
PC_IMM_BRANCH : Immediate branch instruction executed (taken or not)
PC_PROC_RETURN : Procedure return architecturally executed (not by exceptions)
UNALIGNED_ACCESS : Unaligned access architecturally executed
PC_BRANCH_MIS_PRED : Branch mispredicted or not predicted. Counts pipeline flushes because of misprediction
PC_BRANCH_MIS_USED : Branch or change in program flow that could have been predicted 
CPU_CYCLES : Number of CPU cycles

Cortex-A9 supports many other events as well, while some non-cortex ARM Cpu's might have no performance monitoring available in the first place...

Limitations

  • Based on sampling, so all results are estimates
  • Depends on hardware performance counters that not all CPU's have
  • needs debug symbols for good reports


CategoryHowTo

Platform/DevPlatform/Tools/Oprofile (last modified 2011-09-01 04:19:01)