This patch provides alternative performance counter support using the Linux kernel Performance Counter subsystem and the libpfm4 library. This patch does not remove the existing perfctr performance counter support but it does offer a number of benefits to the JikesRVM community.
Libpfm4 leverage's new kernel support (>=2.6.31) for a common performance monitoring framework. This framework is in the main line kernel and extends beyond hardware monitoring events to include kernel events. Perfctr in comparison is an external patch that only provides support for hardware performance counters.
Perfctr requires JikesRVM to keep a translation table of easy to read mnemonics (i.e. L1_MISS) to hardware specific event ID's. Each hardware update potentially requires this table to be updated (currently there are no translations for Intel Nehalem processors). Validating the table is an onerous task and due to lack of widespread use there is the possibility of mistakes. Libpfm4 removes the maintenance burden of such tables from JikesRVM providing the necessary translation services. With its wider usage base libpfm4 provides higher confidence in the correctness of translating mnemonics to event ID's.
Perfctr exposes hardware counter overflows to software via Linux signals, currently JikesRVM has no support for processing these signals. I have witnessed counter overflows whilst using perfctr but there was no user visible indication that this had happened. Support for both overflowing hardware counters and overflowing internal representation of these counters needs to be stronger to ensure the community can have faith in these values. The new kernel perf_event support offers a less intrusive approach than adding perfctr signal handling to JikesRVM. All kernel managed event counters are virtual 64 bit counters that handle underlying hardware overflows.
JikesRVM perfctr support only ever exposed one hardware counter even if more were supported. This patch exposes multiple counters, events may be multiplexed onto available counters.
Some usage instructions are below:
1. A modern (>=2.6.31) Linux kernel with the Performance Counter system compiled. This code is in the main line kernel, some Linux distributions compile it in by default (e.g Ubuntu 10.04)
2. The libpfm4 library available from http://perfmon2.sourceforge.net/
For the impatient...
$ git clone git://perfmon2.git.sourceforge.net/gitroot/perfmon2/libpfm4
$ cd libpfm4
$ sudo make install
If you are building on x86_64 see the notes at the end of this message
1. Meet the prerequisites above.
2. Pass -Dconfig.include.pfm=true to ant along with the rest of your build options.
3. Pass a comma separated list of performance event you are interested in to the RVM like so:
To find a complete list and description of the events available on your hardware and kernel version run the showevtinfo program located in <libpfm4-src-dir>/examples
4. Also specify -X:gc:harnessAll=true option to the RVM.
Interpreting the results:
If all goes well you should find at the very end of the RVM invocation something like:
=== MMTk Statistics Totals ==
The first line (with ".mu") is the number of events recorded whilst the mutator was running. The second line (with ".gc") is the number of events whilst the GC was running.
If you specify more performance events than are supported by your hardware it is possible for the events to be multiplexed onto the available counters. If you do this your output will look like this:
=== MMTk Statistics Totals ==
L1I_MISSES.mu: 1629940088073 (SCALED)
L1I_MISSES.gc: 953482739268 (SCALED)
LLC_MISSES.mu: 1546188225840 (SCALED)
LLC_MISSES.gc: 942745321033 (SCALED)
PERF_COUNT_HW_CACHE_L1D:MISS.mu 19434083 (SCALED)
PERF_COUNT_HW_CACHE_L1D:MISS.gc 5600456 (SCALED)
Events that have been multiplexed and therefore scaled based on time contain "(SCALED)" after the value. It is not possible to directly compare scaled values as there is no guarantee that the events both measured the same parts of the program execution (this can be achieved using counter groups if really needed, see libpfm4 documentation for more details).
Some events can not be multiplexed and might not be measured during an invocation. In that case the word "CONTENDED" is printed after the counter name and no further information is printed.
Best efforts are made to detect if the kernel virtual 64 bit counter exceeds 63 bits (and becomes a negative number in Java). Where detected the word "OVERFLOWED" will be printed next to the counter name and no further information is printed.
At present libpfm4 is rather unforgiving for incorrect event mnemonics, if you make a typo in an event name expect a seg fault!
Notes on building libpfm4 on x86_64:
This at least applies to version 79c9a0d (April 19th 2010) of libpfm4. On x86_64 JikesRVM will build 32 bit programmes but by default libpfm4 will build 64 bit libraries which JikesRVM cannot be linked with. As a quick hack before building libpfm4 add the following two lines to the end of <libpfm4-src-dir>/config.mk