hpcstruct:
Recovery of Static Program Structure
The HPCToolkit Performance Tools
2021/09/15
Version 2021.10.15-master
hpcstruct
- recovers the static program structure of CPU or GPU binaries, as cited in a recorded measurements directory, or
in a single CPU or GPU binary.
The static structure includes procedures, inlined functions, loops, and source lines
See hpctoolkit(1)
for an overview of HPCToolkit.
Table of Contents
hpcstruct
[options]
measurement-directory
hpcstruct
[options]
binary
hpcstruct
recovers the program structure for all CPU and GPU
binaries referenced by a directory containing HPCToolkit performance
measurements. If needed, one can apply hpcstruct
to recover program
structure for an individual CPU or GPU binary.
During execution of a any application, HPCToolkit records the name of the application binary,
and the names of any shared-libraries used. During execution of a GPU-accelerated application,
HPCToolkit also records GPU binaries in the application's measurement directory.
Normally, hpcstruct
is run against an application's HPCToolkit measurement-directory
directing
it to analyze all CPU and GPU binaries recorded within.
When analyzing a measurement-directory,
hpcstruct
writes its results into a subdirectory of the directory.
It analyzes the application and all the shared libraries used during the run, as well as any
GPU binaries recorded in the directory. It also puts links to all shared libraries in another subdirectory.
When hpcprof
is applied to a measurement directory that contains program structure files,
those program structure files will be used to help attribute performance measurements.
When analyzing a CPU or GPU binary b,
hpcstruct
only writes its results to the file
'basename(b).hpcstruct',
and does not link to the binary.
Program structure is a mapping from addresses of machine instructions
in a binary to source code contexts; this mapping is used to attribute
measured performance metrics back to source code. A strength of
hpcstruct
is its ability to attribute metrics to inlined functions
and loops; such mappings are especially useful for understanding the
performance of programs generated using template-based programming
models.
hpcstruct
is designed for analysis of optimized binaries created from
C, C++, Fortran, CUDA, HIP, and DPC++ source code. Because hpcstruct's
algorithms exploit the line map and debug information recorded in an
application binary during compilation, for best results, we recommend that
binaries be compiled with standard debug information or at a minimum,
line map information. Typically, this is accomplished by passing a '-g'
option to each compiler along with any optimization flags. See the
HPCToolkit manual for more information.
To accelerate analysis of a measurement directory, which contains
references to an application as well as any shared libraries
and/or GPU binaries it uses, hpcstruct
employs multiple threads by
default. Multiple small binaries are analyzed concurrently, using one
thread per binary. By default, this analysis will use half of the
threads in the CPU set for the process. Binaries larger than a certain
threshold (see the --psize option and its default) are analyzed using
multiple threads. By default, large binaries will be analyzed using
min(half of the threads in the CPU set for the process, 16) threads.
- measurement directory
-
A measurement directory of an application, either GPU-accelerated or not.
Applying hpcstruct
to a measurement directory analyzes the application, all shared libraries referenced
during the data-collection run, as well as any GPU binaries recorded
in the measurement directory during execution.
- binary
- File containing an executable, a dynamically-linked shared library, or a GPU binary
recorded by HPCToolkit as a program executes.
Note that hpcstruct
does not recover program structure for libraries that binary
depends on.
To recover that structure, run hpcstruct
on each dynamically-linked shared library
or relink your program with static versions of the libraries. Invoking hpcstruct
on a binary
is normally not used.
Default values for an option's optional arguments are shown in {}.
- -V, --version
-
Print version information.
- -h, --help
-
Print help message.
- -v num, --verbose num
-
Generate progress messages to stderr, at verbosity level num.
{1}
-j num, --jobs num
Use num
threads in hpcstruct.
--psize n
Size, in bytes, of a binary that will cause hpcstruct
to analyze a binary in parallel.
Binaries with fewer than n
bytes will be analyzed
concurrently. {100000000}
- --cpu "yes"/"no"
-
Analyze CPU binaries references in a measurements directory. {"yes"}
- --gpu "yes"/"no"
-
Analyze GPU binaries references in a measurements directory. {"yes"}
- --gpucfg "yes"/"no"
-
Compute loop nesting structure for GPU machine code. {"no"}
- -o filename, --output filename
-
Write the output to to filename.
This option is only applicable when invoking
hpcstruct
on a single binary.
- --jobs-struct num
-
Use num
threads for the program structure analysis phase of hpcstruct.
- --jobs-parse num
-
Use num
threads for the parse phase of hpcstruct.
- --jobs-symtab num
-
Use num
threads for the symbol table analysis phase of hpcstruct.
- --show-gaps
-
Developer option to
write a text file describing all the "gaps" found by hpcstruct,
i.e. address regions not identified as belonging to a code or data segment
by the ParseAPI parser used to analyze application executables.
The file is named outfile.gaps,
which by default is
appname.hpcstruct.gaps.
- --time
-
Display the time and space usage per phase in hpcstruct.
-
Assume we have used HPCToolkit to collect performance measurements for the (optimized) CPU binary
sweep3d
and that performance measurement data for the application is in the measurement
directory hpctoolkit-sweep3d-measurements.
Assume that sweep3d
was compiled with debugging information using the -g compiler flag in addition to any
optimization flags.
To recover program structure in sweep3d
and any shared libraries used during the run
for use with hpcprof(1)
, execute:
hpcstruct hpctoolkit-sweep3d-measurements
The output is placed in a subdirectory of the measurements directory.
These program structure files are used to interpret performance measurements in hpctoolkit-sweep3d-measurements.
hpcprof hpctoolkit-sweep3d-measurements
-
Assume we have used HPCToolkit to collect performance measurements for the (optimized) GPU-accelerated
CPU binary laghos,
which offloaded computation onto one or more Nvidia GPUs.
Assume that performance measurement data for the application is in the measurement
directory hpctoolkit-laghos-measurements.
Assume that the CPU code for laghos
was compiled with debugging information using the -g compiler flag in addition to any
optimization flags and that the GPU code the application contains was compiled with line map information (-lineinfo).
To recover program structure information for the laghos CPU binary, and any shared libraries it used
during the run, as well as any GPU binaries it used, execute:
hpcstruct hpctoolkit-laghos-measurements
The measurement directory will be augmented with program structure information recovered for the
laghos binary, any shared libraries it used, and any GPU binaries it used. All will be
stored in subdirectories of the measurements directory.
hpcprof hpctoolkit-laghos-measurements
- For best results, an application binary should be compiled with debugging information.
To generate debugging information while also enabling optimizations,
use the appropriate variant of -g for the following compilers:
- GNU compilers: -g
- Intel compilers: -g -debug inline_debug_info
- IBM compilers: -g -fstandalone-debug -qfulldebug -qfullpath
- PGI compilers: -gopt
- Nvidia's nvcc:
-lineinfo provides line mappings for optimized or unoptimized code
-G provides line mappings and inline information for unoptimized code
- While hpcstruct attempts to guard against inaccurate debugging information,
some compilers (notably PGI's) often generate invalid and inconsistent debugging information.
Garbage in; garbage out.
- C++ mangling is compiler specific. On non-GNU platforms, hpcstruct
tries both platform's and GNU's demangler.
hpctoolkit(1)
.
Version: 2021.10.15-master
- Copyright
- © 2002-2022, Rice University.
- License
- See README.License.
Rice University's HPCToolkit Research Group
Email: hpctoolkit-forum =at= rice.edu
WWW: http://hpctoolkit.org.