The oprofile utility allows users to profile the whole system, using the performance monitoring unit of the CPU. Oprofile needs a kernel module to access the hardware performance counters. By using sampling, the overhead of profiling is very low, but the results are also a bit rough. Letting oprofile run for a long enough time, good results are usually achieved easily.
oprofile project website is http://oprofile.sourceforge.net/news/
The code is managed with git at http://oprofile.git.sourceforge.net/git/gitweb-index.cgi
The project mailing list can be found at http://marc.info/?l=oprofile-list
- As of 2011 the project is actively developed by many people
Available by default at the Linaro developer image, and also at the Overlay PPA. To install it at any Linaro image, just run:
sudo apt-get install oprofile
# opcontrol --no-vmlinux # opcontrol --start # # do things you want to profile # opcontrol --stop # opreport
For a more real world scenario, one would point the --vmlinux=/boot/.. to the debug symbols of the kernel being used. Usually everyone is interested in the CPU_CYCLES event, but oprofile allows you to sample any of the events the cpu supports. For example, the common armv7a events are:
PMNC_SW_INCR : Software increment of PMNC registers IFETCH_MISS : Instruction fetch misses from cache or normal cacheable memory ITLB_MISS : Instruction fetch misses from TLB DCACHE_REFILL : Data R/W operation that causes a refill from cache or normal cacheable memory DCACHE_ACCESS : Data R/W from cache DTLB_REFILL : Data R/W that causes a TLB refill DREAD : Data read architecturally executed (note: architecturally executed = for instructions that are unconditional or that pass the condition code) DWRITE : Data write architecturally executed INSTR_EXECUTED : All executed instructions EXC_TAKEN : Exception taken EXC_EXECUTED : Exception return architecturally executed CID_WRITE : Instruction that writes to the Context ID Register architecturally executed PC_WRITE : SW change of PC, architecturally executed (not by exceptions) PC_IMM_BRANCH : Immediate branch instruction executed (taken or not) PC_PROC_RETURN : Procedure return architecturally executed (not by exceptions) UNALIGNED_ACCESS : Unaligned access architecturally executed PC_BRANCH_MIS_PRED : Branch mispredicted or not predicted. Counts pipeline flushes because of misprediction PC_BRANCH_MIS_USED : Branch or change in program flow that could have been predicted CPU_CYCLES : Number of CPU cycles
Cortex-A9 supports many other events as well, while some non-cortex ARM Cpu's might have no performance monitoring available in the first place...
- Based on sampling, so all results are estimates
- Depends on hardware performance counters that not all CPU's have
- needs debug symbols for good reports
Platform/DevPlatform/Tools/Oprofile (last modified 2011-09-01 04:19:01)