hpcstruct:
Recovery of Static Program Structure

The HPCToolkit Performance Tools

2020/07/15

Version 2020.08-develop

hpcstruct - recovers the static program structure of a CPU or GPU binary, including procedures, inlined functions, loops, and source lines

See hpctoolkit(1) for an overview of HPCToolkit.

Table of Contents

Synopsis

hpcstruct [options] binary

hpcstruct [options] measurement directory of GPU-accelerated application

Description

Given an application binary, a shared library, or a GPU binary, hpcstruct recovers the program structure of its object code by analyzing available information about loop nests, inlined functions, and the mapping between machine instructions and source lines. Program structure is a mapping of a program's object code to its static source-level structure.

When analyzing a CPU binary b, by default, hpcstruct writes its results to the file 'basename(b).hpcstruct'. To improve attribution of performance measurements to program source code, one can pass one or more program structure files (e.g., for an executable and/or one or more shared libraries) to HPCToolkit's analysis tool hpcprof along with one or more HPCToolkit performance measurement directories.

During execution of a GPU-accelerated application on an NVIDIA GPU, HPCToolkit records NVIDIA 'cubin' GPU binaries in the application's measurement directory. To attribute performance to GPU functions in a GPU-accelerated application, one should apply hpcstruct to the application's HPCToolkit measurement directory to analyze all GPU binaries recorded within. When analyzing a measurement directory that includes GPU binaries, any program structure files produced will be recorded inside the measurement directory. When hpcprof is applied to a measurement directory that contains program structure files for GPU binaries, these program structure files will be used to help attribute any GPU performance measurements.

hpcstruct is designed primarily for highly optimized binaries created from C, C++, Fortran, and CUDA source code. Because hpcstruct's algorithms exploit a binary's debugging information, for best results, binary should be compiled with standard debugging information or, at a minimum, line map information. Note: although a CPU or GPU binary may be optimized, it must also be compiled with line mappings for the results to be useful, e.g., normally using the -g flag when compiling CPU binaries.

For faster analysis of large binaries or many GPU binaries, we recommend using the -j option to employ multithreading. As many as 32 cores can be used profitably to analyze large CPU or GPU binaries in the measurements directory for a GPU-accelerated application.

Arguments

binary
File containing an executable or dynamically-linked shared library. Note that hpcstruct does not recover program structure for libraries that binary depends on. To recover that structure, run hpcstruct on each dynamically-linked shared library or relink your program with static versions of the libraries.

measurement directory of GPU-accelerated application
A measurement directory of a GPU-accelerated application that employed NVIDIA GPUs. When a GPU-accelerated application runs on an NVIDIA GPU, its 'cubin' GPU binaries are recorded into HPCToolkit's measurement directory. Applying hpcstruct to a measurement directory analyzes any GPU binaries recorded in the measurement directory during execution.

Default values for an option's optional arguments are shown in {}.

Options: Informational

-V, --version
Print version information.

-h, --help
Print help message.

Options: Parallel Usage

-j num, --jobs num
Use num threads for all phases in hpcstruct. {1}

--gpu-size n
Size (bytes) of a GPU binary that will cause hpcstruct to use num threads to analyze a binary in parallel. GPU binaries with fewer than n bytes will be analyzed concurrently, num at a time. {100000000}

Options: Structure recovery

--gpucfg yes/no
Compute loop nesting structure for GPU machine code. Currently, this applies only to NVIDIA CUDA binaries (cubins). Loop nesting structure is only useful with instruction-level measurements collected using PC sampling. {no}

-I path, --include path
Use path when resolving source file names. This option is useful when a compiler records the same filename in different ways within the symbolic information. (Yes, this does happen.) For a recursive search, append a '+' after the last slash, e.g., /mypath/+. This option may appear multiple times.

-R 'old-path=new-path', --replace-path 'old-path=new-path'
Replace instances of old-path with new-path in all paths with old-path is a prefix (e.g., a profile's load map and source code). Use '\' to escape instances of '=' within specified paths. This option may appear multiple times.

Use this when a profile or executable references files that have been relocated, such as might occur with a file system change.

Options: Output

-o file, --output file
Write results to file. {basename(binary).hpcstruct}

Options for Developers:

--jobs-struct num
Use num threads for the program structure analysis phase of hpcstruct.

--jobs-parse num
Use num threads for the parse phase of hpcstruct.

--jobs-symtab num
Use num threads for the symbol table analysis phase of hpcstruct.

--show-gaps
Developer option to write a text file describing all the "gaps" found by hpcstruct, i.e. address regions not identified as belonging to a code or data segment by the ParseAPI parser used to analyze application executables. The file is named outfile.gaps, which by default is appname.hpcstruct.gaps.

--time
Display the time and space usage per phase in hpcstruct.

Examples

  1. Assume we have used HPCToolkit to collect performance measurements for the (optimized) CPU binary sweep3d and that performance measurement data for the application is in the measurement directory hpctoolkit-sweep3d-measurements. Assume that sweep3d was compiled with debugging information using the -g compiler flag in addition to any optimization flags. We wish to recover program structure in sweep3d for use with hpcprof(1) . To do this, execute:

        hpcstruct sweep3d
    

    By default, the output is placed in a file named sweep3d.hpcstruct.

    To use the program structure file to help interpret performance measurements in hpctoolkit-sweep3d-measurements, provide the program structure file to hpcprof using the -S option, as shown below:

        hpcprof -S sweep3d.hpcstruct hpctoolkit-sweep3d-measurements
    

    Additional program structure files for any shared libraries used by sweep3d can be passed to hpcprof using additional -S options.

  2. Assume we have used HPCToolkit to collect performance measurements for the (optimized) GPU-accelerated CPU binary laghos, which offloaded computation onto one or more NVIDIA GPUs. Assume that performance measurement data for the application is in the measurement directory hpctoolkit-laghos-measurements.

    Assume that the CPU code for laghos was compiled with debugging information using the -g compiler flag in addition to any optimization flags and that the GPU code the application contains was compiled with line map information (-lineinfo).

    To recover program structure information for the laghost CPU binary, execute:

        hpcstruct laghos
    

    By default, the output is placed in a file named laghos.hpcstruct.

    To recover program structure information for the GPU binaries used by laghos, execute:

        hpcstruct hpctoolkit-laghos-measurements
    

    The measurement directory will be augmented with program structure information recovered for the GPU binaries.

    As shown below, one can provide the program structure files to hpcprof by passing the CPU program structure file explicitly using the -S argument and passing the program structure files for the GPU binaries implicitly by passing the measurement directory.

        hpcprof -S laghos.hpcstruct hpctoolkit-laghos-measurements
    

Notes

  1. For best results, an application binary should be compiled with debugging information. To generate debugging information while also enabling optimizations, use the appropriate variant of -g for the following compilers:
    • GNU compilers: -g
    • Intel compilers: -g -debug inline_debug_info
    • IBM compilers: -g -fstandalone-debug -qfulldebug -qfullpath
    • PGI compilers: -gopt
    • NVIDIA's nvcc:
          -lineinfo provides line mappings for optimized or unoptimized code
          -G provides line mappings and inline information for unoptimized code

  2. While hpcstruct attempts to guard against inaccurate debugging information, some compilers (notably PGI's) often generate invalid and inconsistent debugging information. Garbage in; garbage out.

  3. C++ mangling is compiler specific. On non-GNU platforms, hpcstruct tries both platform's and GNU's demangler.

See Also

hpctoolkit(1) .

Version

Version: 2020.08-develop

License and Copyright

Copyright
© 2002-2020, Rice University.
License
See README.License.

Authors

Rice University's HPCToolkit Research Group
Email: hpctoolkit-forum =at= rice.edu
WWW: http://hpctoolkit.org.