Statistical Profiling

The HPCToolkit Performance Tools


Version 2017.11

hpcrun-flat is a flat statistical sampling-based profiler. It supports multiple sample sources during one execution and creates an IP (instruction pointer) histogram, or flat profile, for each sample source. hpcrun-flat profiles complex applications (forks, execs, threads and dynamically loaded libraries) and may be used in conjunction with parallel process launchers such as MPICH's mpiexec and SLURM's srun.

See hpctoolkit(1) for an overview of HPCToolkit.

Table of Contents


hpcrun-flat [profiling-options] [--] command [command-arguments]

hpcrun-flat [info-options]


hpcrun-flat profiles the execution of an arbitrary command command using statistical sampling (rather than instrumentation). It collects per-thread flat profiles, also known as IP (instruction pointer) histograms. Sample points may be generated from multiple simultaneous sampling sources. hpcrun-flat profiles complex applications that use forks, execs, and threads (but not dynamic linking/unlinking); it may be used in conjuction with parallel process launchers such as MPICH's mpiexec and SLURM's srun.

To configure hpcrun-flat's sampling sources, specify events and periods using the -e/--event option. For an event e and period p, after every p instances of e, a sample is generated that causes hpcrun to inspect the and record information about the monitored command.

When command terminates, per-thread profiles are written to files with the names of the form:

hpcrun-flat enables a user to abort a process and write the partial profiling data to disk by sending the Interrupt signal (INT or Ctrl-C). This can be extremely useful on long-running or misbehaving applications.

The special option -- can be used to stop hpcrun-flat option parsing; this is especially useful when command takes arguments of its own.


The command to profile.
Arguments to the command to profile.

Default values for an option's optional arguments are shown in {}.

Options: Informational

-l, --list-events-short
List available events. (N.B.: some may not be profilable)
-L, ---list-events-long
Similar to above but with more information.
Print paths for external PAPI and MONITOR.
-V, --version
Print version information.
-h, --help
Print help.
--debug [n]
Debug: use debug level n. {1}

Options: Profiling

-r [yes |no], --recursive [yes |no]
Profile processes spawned by command. {no}. (Each process will receive its own output file.)

-t mode, --threads mode
Select thread profiling mode {each}: Note that only POSIX threads are supported. Also note that the WALLCLOCK event cannot be used in a multithreaded process.

-e event[:period], --event event[:period]
An event to profile and its corresponding sample period. event may be either a PAPI or native processor event. May pass multiple times. {PAPI_TOT_CYC:999999}
-o [outpath], --output [outpath]
Directory for output data. {.}
--papi-flag flag
Profile style flag. {PAPI_POSIX_PROFIL}


Assume we wish to profile the application zoo. The following examples lists some useful events for different processor architectures. In each case, the special option -- is used to clearly demarcate the end of hpcrun-flat options.

  1. hpcrun-flat -e WALLCLOCK -- zoo
  2. Opteron, (Rev B-F)
    1. hpcrun-flat -e DC_L2_REFILL:1300013 -e PAPI_L2_DCM:510011 -e PAPI_STL_ICY:5300013 -e PAPI_TOT_CYC:13000019 -- zoo (DC_L2_REFILL is an approximation of L1 D-cache misses).
    2. hpcrun-flat -e PAPI_L2_DCM:510011 -e PAPI_TLB_DM:510013 -e PAPI_STL_ICY:5300013 -e PAPI_TOT_CYC:13000019 -- zoo

  3. Pentium IV
    1. hpcrun-flat -e PAPI_TOT_CYC:30000001 -e PAPI_TOT_INS:3000001 -e PAPI_FP_INS:1000001 -e PAPI_LD_INS:1000001 -e PAPI_TLB_TL:32767 -e PAPI_L2_TCM:32767 -e PAPI_RES_STL:1000001 -e BSQ_cache_reference_RD_3rdL_MISS -- zoo
    2. hpcrun-flat -e PAPI_SR_INS:1000001 -e PAPI_L1_DCM:32767 -e resource_stall_SBFULL:32767 -- zoo
    3. hpcrun-flat -e PAPI_FP_OPS:32767 -e PAPI_BR_MSP:32767 -- zoo

  4. Itanium 2
    1. hpcrun-flat -e BE_EXE_BUBBLE_ALL:344221 -e BE_L1D_FPU_BUBBLE_ALL:344221 -e FE_BUBBLE_ALL:144221 -e PAPI_TOT_CYC:344221 -- zoo
    2. hpcrun-flat -e PAPI_L1_DCM:144221 -e PAPI_FP_OPS:344221 -e PAPI_TOT_CYC:1044221 -- zoo


Sample sources

hpcrun-flat uses the PAPI library to provide access to hardware performance counter events. If you have not configured HPCToolkit to use the PAPI library, you will be unable to measure hardware performance counter events.

The PAPI library supports a large collection of hardware counter events. Some events have standard names across all platforms, e.g. PAPI_TOT_CYC, the event that measures total cycles. In addition to events whose names begin with the PAPI_ prefix, platforms also provide access to a set of native events with names that are specific to the platform's processor. A complete list of events supported by the PAPI library for your platform may be obtained by using the --list-events option. Any event whose name begins with the PAPI_ prefix that is listed as "Profilable" can be used as an event in a sampling source --- provided it does not conflict with another event.

The precise rules for selecting good events and periods are complex.


See Also

hpctoolkit(1) .


Version: 2017.11 of 2011/02/22.

License and Copyright

© 2002-2018, Rice University.
See README.License.


Nathan Tallent
John Mellor-Crummey
Rob Fowler
Rice HPCToolkit Research Group
Email: hpctoolkit-forum =at= rice.edu
WWW: http://hpctoolkit.org.