/!\ Needs to be renamed to an engineering unit specification page /!\

Summary

As of Linux 2.6.33, ARMv6 and v7 are supported architectures of the perf-events tools. The objectives of this blueprint are to provide Perf and OProfile support for Ubuntu on ARM and discuss designs for `uncore' events and visualisation tools.

Release Note

Hardware profiling features for ARM platforms using OProfile and Perf-events.

Rationale

x86/AMD64 currently have these features in Ubuntu.

Design

The Kernel support required is already available in upstream repositories and on the mailing lists, so most of the backend work is already complete. However, the following issues are still undergoing discussion:

System Events [aka Uncore / nest events]

  • This was discussed on the LKML.

  • The sysfs events design described here may solve some of the issues related to system events [particularly, addressing the relevant resources from userspace].

  • The difficulty in obtaining a frame of reference for system events needn't cause a problem as it may be useful to observe the system as a whole over a period of time. For example, breaking down the load on a GPU during video playback.

Graphical Tools

  • Currently, the only graphical tool in the perf framework generates a timeline of the system as an svg file.
  • Graphical tool features and UI. This requires discussion: What do we want to visualise and how do we want to visualise it?

  • Kcachegrind can import sample data from OProfile. A new converter could be written to support perf directly.

Implementation

Kernel

  • If the Kernel being used is 2.6.35 [as proposed for Maverick] then the ARM backend will already be present
  • For earlier Kernel versions, backport ARM backend [perf and OProfile] Kernel patches to the Ubuntu Kernel sources
  • The Kernel configuration for OMAP3 platforms must ensure that CONFIG_OMAP3_EMU is selected in order to enable the counters
  • Ensure that supported boards [Babbage and Dove] expose their hardware performance counter capabilities in their BSP code

BSP support

To register the CPU PMU with the Kernel, BSP code must register a platform_device as follows:

#include <asm/pmu.h>

/* The PMU resource. For multicore platforms, this will be an array of resources */
static struct resource pmu_resource = {
    .start = PMU_IRQ, /* IRQ defined in the board-specific header files */
    .end   = PMU_IRQ,
    .flags = IORESOURCE_IRQ,
};

static struct platform_device pmu_device = {
    .name          = "arm-pmu",
    .id            = ARM_PMU_DEVICE_CPU,
    .num_resources = 1, /* Number of PMUs [cores] */
    .resource      = &pmu_resource,
};

/* Somewhere in the init code: */
...
platform_device_register(&pmu_device);
...

Userspace

  • With Perf available in the Kernel, OProfile can be made available `for free' by updating the OProfile userspace tools to recognise Cortex-A9 and SMP platforms
  • Package linux-tools for ARM [may need patches for Thumb-2]

Symbol resolution

  • Symbol resolution requires that the DWARF debug information is available for applications being profiled.
  • Since these are packaged as separate ddebs, some modification to the reporting tools may be required.

Future work

There is scope for lots of future work including realtime sample collection from a remote target and the development of graphical frontend tools.

Verbatim Notes from UDS Session

slides: https://wiki.ubuntu.com/Specs/M/ARMDebuggingWithOprofileAndPerf

Discussion

  • platform device infrastructure used for performance monitor unit (PMU) - simple to new, but not typically present for new BSPs --- BSP maintainers need to add it.
  • Probably doesn't work in QEMU.
  • Hardware typically has many counters all over the system -- not currently clear how to export all of this through perfevents. The various parts of the hardware might
    • be asynchronous in terms of measuring the events and hence tying it to a frame of reference is a hard problem.
  • What are the usecases for the other counters on the system ?
    • L2 cache is probably the easiest "uncore" event to handle.
    • "uncore" events not a solved problem on any architecture yet -- easy to put in the kernel, but the tools don't really understand non-CPU-specific events yet.
    • may be able to include data about event sources in sysfs
  • kiko: graphing performance profile data over time could be useful
  • linux-tools (see http://packages.ubuntu.com/lucid/linux-tools) is not currently built for ARM, but ought to be straightforward.

  • Tools are in the kernel tree - packaged specific to a kernel version
  • Restrict number of running processes and create a stable "environment"

GUIs and frontends

  • The only "GUI" is an SVG generation backend - SVGs tend to be impossible to open if profiling has been run for sufficient time (too much data?)
  • "perf top" command (from linux-tools?) provides real-time view of profiling data on the console.
  • "perf record <command>" runs <command>, capturing profiling data

  • "perf report" reports results
  • "perf annotate" produces source, annotated with profiling data
  • "perf list" lists the event types supported by the system
  • "perf stat" gives overall statistics for a profiling run
  • Tools may not understand getting symbols from ddebs yet (separated debug symbol information packages in Ubuntu)
  • ddebs are not generated for all packages yet, not generated in all archives (e.g., Debian) and not generated for PPAs right now. It would be useful to enable ddeb generation more widely.
  • A converter tools may exist in kcachegrind to convert profiling data.

Portability

  • Userspace tools need to understand what event types exist per platform
  • May not be feasible for the kernel to export this database to userspace --- it may be large for some platforms, including ARM.
  • These tools are not so good at displaying multiple event types simultaneously in a useful way -- an integrated GUI would be nice, but doesn't exist yet.

Streaming

  • Could be useful to stream perf data off a client machine to be processed remotely at a server.
  • Looks feasible, but not implemented yet?
  • Data has headers which need to be parsed up-front, but the main data is a stream and can be processed incrementally over time.
  • Should be possible to extract ddebs on the server (which may be a foreign architecture), but this may not work out of the box. Cross-/multi-arch binutils would also be needed on the server.

Action Items

  • Will Deacon: update oprofile userspace
  • Will Deacon: create / provide link to wiki page documenting how to enable perf event support for a new platform.
  • Will Deacon: if a vmlinux file needed to profile inside the kernel?
  • Loic Minier: Enable linux-tools building for armel. Packages may be needed per-platform (i.e., per kernel tree)
  • Dave Martin: Teach the perf userspace tools how to get symbol information out of ddebs, not necessarily on /usr/lib/debug (for cross-arch support).
  • scottb: Work out how to get cross-profiling to work (getting the right binaries and debug symbol archives onto the server). Use sshfs to access the target's filesystem? (but may be a space issue)`

Cycles/1011/Blueprints/DebuggingWithOprofile (last modified 2011-03-25 18:15:44)