hpcstruct:
Recovery of Static Program Structure
The HPCToolkit Performance Tools
2021/09/15
Version 2024.01.1-release
hpcstruct
- recovers the static program structure of CPU or GPU binaries, as cited in a recorded
measurements directory, or in a single CPU or GPU binary.
A binary's static structure includes information about source files, procedures,
inlined functions, loops, and source lines.
See hpctoolkit(1)
for an overview of HPCToolkit.
Table of Contents
Synopsis
hpcstruct
[options]
measurement-directory
hpcstruct
[options]
binary
Description
hpcstruct
generates a program structure file for each CPU and GPU
binary referenced by a directory containing HPCToolkit performance
measurements.
hpcstruct
is usually applied to an application's HPCToolkit measurement-directory,
directing
it to analyze all CPU and GPU binaries measured during an execution.
Although not normally needed, one can apply hpcstruct
to recover program
structure for an individual CPU or GPU binary.
Program structure is a mapping from addresses of machine instructions
in a binary to source code contexts; this mapping is used to attribute
measured performance metrics back to source code. A strength of
hpcstruct
is its ability to attribute metrics to inlined functions
and loops; such mappings are especially useful for understanding the
performance of programs generated using template-based programming
models.
When hpcprof
is invoked on a measurement-directory
that contains program structure files,
they will be used attribute performance measurements.
hpcstruct
is designed to cache its results so that processing multiple measurement directories
can copy previously generated structure files from the cache rather than regenerating the information.
A cache may be specified either on the command line (see the -C option, below) or
by setting the HPCTOOLKIT_HPCSTRUCT_CACHE environment variable.
If a cache is specified, hpcstruct
emits a message for each binary processed saying one of "Added to cache",
"Replaced in cache", "Copied from cache", or "Copied from cache +". The latter indicates that although the
structure file was found in the cache, it came from an identical binary with a different full path,
so it was edited to reflect the actual path.
"Replaced in cache" indicates that a previous version at that path was replaced.
If the --nocache
option is set, hpcstruct
will not use the cache, even if the environment
variable is set, and the message will say "(Cache disabled by user)".
If no cache is specified, the message will say "Cache not specified",
and, unless the --nocache option was set, an ADVICE message urging use of the cache will be written.
Users are strongly urged to use a cache.
When an application execution is measured using hpcrun,
HPCToolkit records the name of the application binary,
and the names of any shared-libraries and GPU binaries it used. As a GPU-accelerated application is measured,
HPCToolkit also records the contents of any GPU binaries it loads in the application's measurement directory.
When analyzing a measurement-directory,
hpcstruct
writes its results into a subdirectory of the directory.
It analyzes the application and all the shared libraries and GPU binaries used during the run.
When rerun against a measurement-directory,
hpcstruct
will reprocess only those GPU binaries that were
previously processed with a different --gpucfg
setting.
To accelerate analysis of a measurement directory, which contains
references to an application as well as any shared libraries
and/or GPU binaries it uses, hpcstruct
employs multiple threads by
default. A pool of threads equal to half of the threads in the CPU set for
the process is used.
Binaries larger than a certain threshold (see the --psize
option and
its default) are analyzed using more OpenMP threads than those smaller than
the threshold.
Multiple binaries are processed concurrently.
hpcstruct
will describe the actual parallelization and concurrency
used when the run starts.
When analyzing a single CPU or GPU binary b,
hpcstruct
writes its results to the file
'basename(b).hpcstruct'
in the current directory.
When rerun against a single binary, it will reprocess that binary and rewrite
its structure file.
hpcstruct
is designed for analysis of optimized binaries created from
C, C++, Fortran, CUDA, HIP, and DPC++ source code. Because hpcstruct's
algorithms exploit the line map and debug information recorded in an
application binary during compilation, for best results, we recommend that
binaries be compiled with standard debug information or at a minimum,
line map information. Typically, this is accomplished by passing a '-g'
option to each compiler along with any optimization flags. See the
HPCToolkit manual for more information.
Arguments
- measurement directory
-
A measurement directory of an application, either GPU-accelerated or not.
Applying hpcstruct
to a measurement directory analyzes the application as well as
all shared libraries and GPU binaries measured during the data-collection run.
- binary
- File containing an executable, a dynamically-linked shared library, or a GPU binary
recorded by HPCToolkit as a program executes.
Note that hpcstruct
does not recover program structure for libraries that binary
depends on.
To recover that structure, run hpcstruct
on each dynamically-linked shared library
or relink your program with static versions of the libraries. Invoking hpcstruct
on a binary
is normally not used.
Default values for an option's optional arguments are shown in {}.
Options: Informational
- -V, --version
-
Print version information.
- -h, --help
-
Print help message.
- -v num, --verbose num
-
Generate progress messages to stderr, at verbosity level num.
{1}
Options: Using the Cache
-c dir, --cache dir
Specify the directory to be used for the structure cache.
--nocache
Specify the the structure cache is not to be used, even if one is specified by the
HPCTOOLKIT_HPCSTRUCT_CACHE environment variable.
Options: Override parallel defaults
-j num, --jobs num
Specify the number of threads to be used. num
OpenMP threads will be used to analyze any large
binaries. A pool of num
threads will be used to
analyze small binaries.
--psize n
hpcstruct will consider any binary of at least
psize
bytes as large. hpcstruct will use more
OpenMP threads to analyze large binaries than
it uses to analyze small binaries. {100000000}
-s v, --stack v
Set the stack size for OpenMP worker threads to .
is a positive number optionally followed by
a suffix: B (bytes), K (kilobytes), M (megabytes),
or G (gigabytes). Without a suffix, will be
interpreted as kilobytes. One can also control the
stack size by setting the OMP_STACKSIZE environment
variable. A '-s ' option takes precedence,
followed by OMP_STACKSIZE. {32M}
Options: override structure recovery defaults
- --cpu "yes"/"no"
-
Analyze CPU binaries references in a measurements directory. {"yes"}
- --gpu "yes"/"no"
-
Analyze GPU binaries references in a measurements directory. {"yes"}
- --gpucfg "yes"/"no"
-
Compute loop nesting structure for GPU machine code. {"no"}
Options to control output:
- -o filename, --output filename
-
Write the output to to filename.
This option is only applicable when invoking
hpcstruct
on a single binary.
Options for Developers:
- --pretty-print
-
Add indenting for more readable XML output.
- --jobs-struct num
-
Use num
threads for the program structure analysis phase of hpcstruct.
- --jobs-parse num
-
Use num
threads for the parse phase of hpcstruct.
- --jobs-symtab num
-
Use num
threads for the symbol table analysis phase of hpcstruct.
- --show-gaps
-
Developer option to
write a text file describing all the "gaps" found by hpcstruct,
i.e. address regions not identified as belonging to a code or data segment
by the ParseAPI parser used to analyze application executables.
The file is named outfile.gaps,
which by default is
appname.hpcstruct.gaps.
- --time
-
Display the time and space usage per phase in hpcstruct.
Options for internal use only
- -M dirname
-
Indicates that hpcstruct was invoked by a script used to process measurement directory
measurement-dir.
This information is used to control messages.
Examples
-
Assume we have used HPCToolkit to collect performance measurements for the (optimized) CPU binary
sweep3d
and that performance measurement data for the application is in the measurement
directory hpctoolkit-sweep3d-measurements.
Assume that sweep3d
was compiled with debugging information using the -g compiler flag in addition to any
optimization flags.
To recover program structure in sweep3d
and any shared libraries used during the run
for use with hpcprof(1)
, execute:
hpcstruct hpctoolkit-sweep3d-measurements
The output is placed in a subdirectory of the measurements directory.
These program structure files are used to interpret performance measurements in hpctoolkit-sweep3d-measurements.
hpcprof hpctoolkit-sweep3d-measurements
-
Assume we have used HPCToolkit to collect performance measurements for the (optimized) GPU-accelerated
CPU binary laghos,
which offloaded computation onto one or more Nvidia GPUs.
Assume that performance measurement data for the application is in the measurement
directory hpctoolkit-laghos-measurements.
Assume that the CPU code for laghos
was compiled with debugging information using the -g compiler flag in addition to any
optimization flags and that the GPU code the application contains was compiled with line map information (-lineinfo).
To recover program structure information for the laghos CPU binary, and any shared libraries it used
during the run, as well as any GPU binaries it used, execute:
hpcstruct hpctoolkit-laghos-measurements
The measurement directory will be augmented with program structure information recovered for the
laghos binary, any shared libraries it used, and any GPU binaries it used. All will be
stored in subdirectories of the measurements directory.
hpcprof hpctoolkit-laghos-measurements
Notes
- For best results, an application binary should be compiled with debugging information.
To generate debugging information while also enabling optimizations,
use the appropriate variant of -g for the following compilers:
- GNU compilers: -g
- Intel compilers: -g -debug inline_debug_info
- IBM compilers: -g -fstandalone-debug -qfulldebug -qfullpath
- PGI compilers: -gopt
- Nvidia's nvcc:
-lineinfo provides line mappings for optimized or unoptimized code
-G provides line mappings and inline information for unoptimized code
- While hpcstruct attempts to guard against inaccurate debugging information,
some compilers (notably PGI's) often generate invalid and inconsistent debugging information.
Garbage in; garbage out.
- C++ mangling is compiler specific. On non-GNU platforms, hpcstruct
tries both platform's and GNU's demangler.
See Also
hpctoolkit(1)
.
Version
Version: 2024.01.1-release
License and Copyright
- Copyright
- © 2002-2023, Rice University.
- License
- See LICENSE.
Authors
Rice University's HPCToolkit Research Group
Email: hpctoolkit-forum =at= rice.edu
WWW: https://hpctoolkit.org.