hpcstruct:
Recovery of Static Program Structure

The HPCToolkit Performance Tools

2021/09/15

Version 2021.05.15-master

hpcstruct - recovers the static program structure of CPU or GPU binaries, as cited in a recorded measurements directory, or in a single CPU or GPU binary. The static structure includes procedures, inlined functions, loops, and source lines

See hpctoolkit(1) for an overview of HPCToolkit.

Table of Contents

Synopsis

hpcstruct [options] measurement-directory

hpcstruct [options] binary

Description

hpcstruct recovers the program structure for all CPU and GPU binaries referenced by a directory containing HPCToolkit performance measurements. If needed, one can apply hpcstruct to recover program structure for an individual CPU or GPU binary.

During execution of a any application, HPCToolkit records the name of the application binary, and the names of any shared-libraries used. During execution of a GPU-accelerated application, HPCToolkit also records GPU binaries in the application's measurement directory.

Normally, hpcstruct is run against an application's HPCToolkit measurement-directory directing it to analyze all CPU and GPU binaries recorded within.

When analyzing a measurement-directory, hpcstruct writes its results into a subdirectory of the directory. It analyzes the application and all the shared libraries used during the run, as well as any GPU binaries recorded in the directory. It also puts links to all shared libraries in another subdirectory. When hpcprof is applied to a measurement directory that contains program structure files, those program structure files will be used to help attribute performance measurements.

When analyzing a CPU or GPU binary b, hpcstruct only writes its results to the file 'basename(b).hpcstruct', and does not link to the binary.

Program structure is a mapping from addresses of machine instructions in a binary to source code contexts; this mapping is used to attribute measured performance metrics back to source code. A strength of hpcstruct is its ability to attribute metrics to inlined functions and loops; such mappings are especially useful for understanding the performance of programs generated using template-based programming models.

hpcstruct is designed for analysis of optimized binaries created from C, C++, Fortran, CUDA, HIP, and DPC++ source code. Because hpcstruct's algorithms exploit the line map and debug information recorded in an application binary during compilation, for best results, we recommend that binaries be compiled with standard debug information or at a minimum, line map information. Typically, this is accomplished by passing a '-g' option to each compiler along with any optimization flags. See the HPCToolkit manual for more information.

To accelerate analysis of a measurement directory, which contains references to an application as well as any shared libraries and/or GPU binaries it uses, hpcstruct employs multiple threads by default. Multiple small binaries are analyzed concurrently, using one thread per binary. By default, this analysis will use half of the threads in the CPU set for the process. Binaries larger than a certain threshold (see the --psize option and its default) are analyzed using multiple threads. By default, large binaries will be analyzed using min(half of the threads in the CPU set for the process, 16) threads.

Arguments

measurement directory
A measurement directory of an application, either GPU-accelerated or not. Applying hpcstruct to a measurement directory analyzes the application, all shared libraries referenced during the data-collection run, as well as any GPU binaries recorded in the measurement directory during execution.

binary
File containing an executable, a dynamically-linked shared library, or a GPU binary recorded by HPCToolkit as a program executes. Note that hpcstruct does not recover program structure for libraries that binary depends on. To recover that structure, run hpcstruct on each dynamically-linked shared library or relink your program with static versions of the libraries. Invoking hpcstruct on a binary is normally not used.

Default values for an option's optional arguments are shown in {}.

Options: Informational

-V, --version
Print version information.

-h, --help
Print help message.

-v num, --verbose num
Generate progress messages to stderr, at verbosity level num. {1}

Options: Override parallel defaults

-j num, --jobs num
Use num threads in hpcstruct.

--psize n
Size, in bytes, of a binary that will cause hpcstruct to analyze a binary in parallel. Binaries with fewer than n bytes will be analyzed concurrently. {100000000}

Options: override structure recovery defaults

--cpu "yes"/"no"
Analyze CPU binaries references in a measurements directory. {"yes"}

--gpu "yes"/"no"
Analyze GPU binaries references in a measurements directory. {"yes"}

--gpucfg "yes"/"no"
Compute loop nesting structure for GPU machine code. {"no"}

Options to control output:

-o filename, --output filename
Write the output to to filename. This option is only applicable when invoking hpcstruct on a single binary.

Options for Developers:

--jobs-struct num
Use num threads for the program structure analysis phase of hpcstruct.

--jobs-parse num
Use num threads for the parse phase of hpcstruct.

--jobs-symtab num
Use num threads for the symbol table analysis phase of hpcstruct.

--show-gaps
Developer option to write a text file describing all the "gaps" found by hpcstruct, i.e. address regions not identified as belonging to a code or data segment by the ParseAPI parser used to analyze application executables. The file is named outfile.gaps, which by default is appname.hpcstruct.gaps.

--time
Display the time and space usage per phase in hpcstruct.

Examples

  1. Assume we have used HPCToolkit to collect performance measurements for the (optimized) CPU binary sweep3d and that performance measurement data for the application is in the measurement directory hpctoolkit-sweep3d-measurements. Assume that sweep3d was compiled with debugging information using the -g compiler flag in addition to any optimization flags. To recover program structure in sweep3d and any shared libraries used during the run for use with hpcprof(1) , execute:
        hpcstruct hpctoolkit-sweep3d-measurements
    

    The output is placed in a subdirectory of the measurements directory.
    These program structure files are used to interpret performance measurements in hpctoolkit-sweep3d-measurements.
        hpcprof hpctoolkit-sweep3d-measurements
    

  2. Assume we have used HPCToolkit to collect performance measurements for the (optimized) GPU-accelerated CPU binary laghos, which offloaded computation onto one or more Nvidia GPUs. Assume that performance measurement data for the application is in the measurement directory hpctoolkit-laghos-measurements.
    Assume that the CPU code for laghos was compiled with debugging information using the -g compiler flag in addition to any optimization flags and that the GPU code the application contains was compiled with line map information (-lineinfo).
    To recover program structure information for the laghos CPU binary, and any shared libraries it used during the run, as well as any GPU binaries it used, execute:
        hpcstruct hpctoolkit-laghos-measurements
    

    The measurement directory will be augmented with program structure information recovered for the laghos binary, any shared libraries it used, and any GPU binaries it used. All will be stored in subdirectories of the measurements directory.
        hpcprof hpctoolkit-laghos-measurements
    

Notes

  1. For best results, an application binary should be compiled with debugging information. To generate debugging information while also enabling optimizations, use the appropriate variant of -g for the following compilers:
    • GNU compilers: -g
    • Intel compilers: -g -debug inline_debug_info
    • IBM compilers: -g -fstandalone-debug -qfulldebug -qfullpath
    • PGI compilers: -gopt
    • Nvidia's nvcc:
          -lineinfo provides line mappings for optimized or unoptimized code
          -G provides line mappings and inline information for unoptimized code

  2. While hpcstruct attempts to guard against inaccurate debugging information, some compilers (notably PGI's) often generate invalid and inconsistent debugging information. Garbage in; garbage out.

  3. C++ mangling is compiler specific. On non-GNU platforms, hpcstruct tries both platform's and GNU's demangler.

See Also

hpctoolkit(1) .

Version

Version: 2021.05.15-master

License and Copyright

Copyright
© 2002-2021, Rice University.
License
See README.License.

Authors

Rice University's HPCToolkit Research Group
Email: hpctoolkit-forum =at= rice.edu
WWW: http://hpctoolkit.org.