HPCToolkit provides the hpcviewer performance presentation tool for interactive examination of performance databases. hpcviewer interactively presents context-sensitive performance metrics correlated to program structure and mapped to a program’s source code, if available. It can present an arbitrary collection of performance metrics gathered during one or more runs or compute derived metrics.
hpcviewer can either be launched from a command line (Linux/Unix platform) or by clicking the hpcviewer icon (for Windows, Mac OS X and Linux/Unix platform). The command line syntax is as follows:
Here, <hpctoolkit-database> is an optional argument to load a database automatically. Without this argument, hpcviewer will prompt for the location of a database.
The possible options are as follows:
Figure 1 shows an annotated screenshot of hpcviewer’s user interface presenting a call path profile. The annotations highlight hpcviewer’s principal window panes and key controls. The browser window is divided into three panes. The Source pane (top) displays program source code. The Navigation and Metric panes (bottom) associate a table of performance metrics with static or dynamic program structure. These panes are discussed in more detail in Section 3.
hpcviewer displays calling-context-sensitive performance data in three different views: a top-down Top-down View, a bottom-up Bottom-up View, and a Flat View. One selects the desired view by clicking on the corresponding view control tab. We briefly describe the three views and their corresponding purposes.
A calling context for a procedure f consists of the stack of procedure frames active when the call was made to f. Using this view, one can readily see how much of the application’s cost was incurred by f when called from a particular calling context. If finer detail is of interest, one can explore how the costs incurred by a call to f in a particular context are divided between f itself and the procedures it calls. HPCToolkit’s call path profiler hpcrun and the hpcviewer user interface distinguish calling context precisely by individual call sites; this means that if a procedure g contains calls to procedure f in different places, these represent separate calling contexts.
When several levels of the Bottom-up View are expanded, saying that the Bottom-up View apportions metrics of a callee on behalf of its caller can be confusing: what is the caller and what is the callee? In this situation, we can say that the Bottom-up View apportions the metrics of a particular procedure in its various calling contexts on behalf of that context’s caller. Alternatively but equivalently, the Bottom-up View apportions the metrics of a particular procedure on behalf of its various calling contexts.
hpcviewer’s browser window is divided into three panes: the Navigation pane, Source pane, and the Metrics pane. We briefly describe the role of each pane.
The source pane displays the source code associated with the current entity selected in the navigation pane. When a performance database is first opened with hpcviewer, the source pane is initially blank because no entity has been selected in the navigation pane. Selecting any entity in the navigation pane will cause the source pane to load the corresponding file, scroll to and highlight the line corresponding to the selection. Switching the source pane to view to a different source file is accomplished by making another selection in the navigation pane.
The navigation pane presents a hierarchical tree-based structure that is used to organize the presentation of an applications’s performance data. Entities that occur in the navigation pane’s tree include load modules, files, procedures, procedure activations, inlined code, loops, and source lines. Selecting any of these entities will cause its corresponding source code (if any) to be displayed in the source pane. One can reveal or conceal children in this hierarchy by ‘opening’ or ‘closing’ any non-leaf (i.e., individual source line) entry in this view.
The nature of the entities in the navigation pane’s tree structure depends upon whether one is exploring the Top-down View, the Bottom-up View, or the Flat View of the performance data.
The header above the navigation pane contains some controls for the navigation and metric view. In Figure 1, they are labeled as “navigation/metric control.”
Enabling to flatten and unflatten the navigation hierarchy. Clicking on the flatten button (the icon that shows a tree node with a slash through it) will replace each top-level scope shown with its children. If a scope has no children (i.e., it is a leaf ), the node will remain in the view. This flattening operation is useful for relaxing the strict hierarchical view so that peers at the same level in the tree can be viewed and ranked together. For instance, this can be used to hide procedures in the Flat View so that outer loops can be ranked and compared to one another. The inverse of the flatten operation is the unflatten operation, which causes an elided node in the tree to be made visible once again.
Depressing the up arrow button will zoom in to show only information for the selected line and its descendants. One can zoom out (reversing a prior zoom operation) by depressing the down arrow button.
This button is used to automatically find hot call paths with respect to the currently selected metric column. The hot path is computed by comparing parent and child metric values, and showing the chain where the difference is greater than a threshold (by default is 50%). It is also possible to change the threshold value by clicking the menu File—Preference.
Creating a new metric based on mathematical formula. See Section 5 for more details.
Showing and hiding metric columns. A dialog box will appear, and user can select which columns to show or hide. See Section 8.2 section for more details.
Exporting the current metric table into a comma separated value (CSV) format file. This feature only exports all metrics that are currently shown. Metrics that are not shown in the view (whose scopes are not expanded) will not be exported (we assume these metrics are not significant).
Increasing or decreasing the size of the navigation and metric panes.
Showing the graph (plot, sorted plot or histogram) of metric values of the selected node in CCT for all processes or threads (Section 6.1). This menu is only available if the database is generated by hpcprof-mpi instead of hpcprof.
Showing the CCT and the metrics of a seletected threads (Section 6.2). This menu is only available if the database is generated by hpcprof-mpi instead of hpcprof.
Navigation control also provides several context menus by clicking the right-button of the mouse. As shown in Figure 2, the menus are:
The metric pane displays one or more performance metrics associated with entities to the left in the navigation pane. Entities in the tree view of the navigation pane are sorted at each level of the hierarchy by the metric in the selected column. When hpcviewer is launched, the leftmost metric column is the default selection and the navigation pane is sorted according to the values of that metric in descending order. One can change the selected metric by clicking on a column header. Clicking on the header of the selected column toggles the sort order between descending and ascending.
During analysis, one often wants to consider the relationship between two metrics. This is easier when the metrics of interest are in adjacent columns of the metric pane. One can change the order of columns in the metric pane by selecting the column header for a metric and then dragging it left or right to its desired position. The metric pane also includes scroll bars for horizontal scrolling (to reveal other metrics) and vertical scrolling (to reveal other scopes). Vertical scrolling of the metric and navigation panes is synchronized.
hpcviewer can present an arbitrary collection of performance metrics gathered during one or more runs, or compute derived metrics expressed as formulae with existing metrics as terms.
For any given scope in hpcviewer’s three views, hpcviewer computes both inclusive and exclusive metric values. For the moment, consider the Top-down View. Inclusive metrics reflect costs for the entire subtree rooted at that scope. Exclusive metrics are of two flavors, depending on the scope. For a procedure, exclusive metrics reflect all costs within that procedure but excluding callees. In other words, for a procedure, costs are exclusive with respect to dynamic call chains. For all other scopes, exclusive metrics reflect costs for the scope itself; i.e., costs are exclusive with respect to static structure. The Bottom-up and Flat Views contain inclusive and exclusive metric values that are relative to the Top-down View. This means, e.g., that inclusive metrics for a particular scope in the Bottom-up or Flat View are with respect to that scope’s subtree in the Top-down View.
Call path profile measurements collected by hpcrun correspond directly to the Top-down View. hpcviewer derives all other views from exclusive metric costs in the Top-down View. For the Caller View, hpcviewer collects the cost of all samples in each function and attribute that to a top-level entry in the Caller View. Under each top-level function, hpcviewer can look up the call chain at all of the context in which the function is called. For each function, hpcviewer apportions its costs among each of the calling contexts in which they were incurred. hpcviewer computes the Flat View by traversing the calling context tree and attributing all costs for a scope to the scope within its static source code structure. The Flat View presents a hierarchy of nested scopes for load modules, files, procedures, loops, inlined code and statements.
file1.c | file2.c |
f () {
g (); } // m is the main routine m () { f (); g (); }
|
// g can be a recursive function
g () { if ( . . ) g (); if ( . . ) h (); } h () { }
|
Figure 3 shows an example of a recursive program separated into two files, file1.c and file2.c. In this figure, we use numerical subscripts to distinguish between different instances of the same procedure. In the other parts of this figure, we use alphabetic subscripts. We use different labels because there is no natural one-to-one correspondence between the instances in the different views.
Routine g can behave as a recursive function depending on the value of the condition branch (lines 3–4). Figure 4 shows an example of the call chain execution of the program annotated with both inclusive and exclusive costs. Computation of inclusive costs from exclusive costs in the Top-down View involves simply summing up all of the costs in the subtree below.
In this figure, we can see that on the right path of the routine m, routine g (instantiated in the diagram as g1) performed a recursive call (g2) before calling routine h. Although g1, g2 and g3 are all instances from the same routine (i.e., g), we attribute a different cost for each instance. This separation of cost can be critical to identify which instance has a performance problem.
Figure 5 shows the corresponding scope structure for the Caller View and the costs we compute for this recursive program. The procedure g noted as ga (which is a root node in the diagram), has different cost to g as a callsite as noted as gb, gc and gd. For instance, on the first tree of this figure, the inclusive cost of ga is 9, which is the sum of the highest cost for each branch in calling context tree (Figure 4): the inclusive cost of g3 (which is 3) and g1 (which is 6). We do not attribute the cost of g2 here since it is a descendant of g1 (in other term, the cost of g2 is included in g1).
Inclusive costs need to be computed similarly in the Flat View. The inclusive cost of a recursive routine is the sum of the highest cost for each branch in calling context tree. For instance, in Figure 6, The inclusive cost of gx, defined as the total cost of all instances of g, is 9, and this is consistently the same as the cost in caller tree. The advantage of attributing different costs for each instance of g is that it enables a user to identify which instance of the call to g is responsible for performance losses.
Frequently, the data become useful only when combined with other information such as the number of instructions executed or the total number of cache accesses. While users don’t mind a bit of mental arithmetic and frequently compare values in different columns to see how they relate for a scope, doing this for many scopes is exhausting. To address this problem, hpcviewer provides a mechanism for defining metrics. A user-defined metric is called a “derived metric.” A derived metric is defined by specifying a spreadsheet-like mathematical formula that refers to data in other columns in the metric table by using $n to refer to the value in the nth column.
The formula syntax supported by hpcviewer is inspired by spreadsheet-like in-fix mathematical formulae. Operators have standard algebraic precedence.
Suppose the database contains information about 5 processes, each with two metrics:
To compute the average number of cycles per floating point operation across all of the processes, we can define a formula as follows:
A derived metric can be created by clicking the Derived metric tool item in the navigation/control pane. A derived metric window will then appear as shown in Figure 7.
The window has two main parts:
Note that the entered formula and the metric name will be stored automatically. One can then review again the formula (or metric name) by clicking the small triangle of the combo box (marked with a red circle).
HPCToolkit Experiment databases that have been generated by hpcprof-mpi (in contrast to hpcprof) can be used by hpcviewer to plot graphs of thread-level metric values. This is particularly useful for quickly assessing load imbalance in context across the several threads or processes of an execution. Figure 8 shows hpcviewer rendering such a plot. The horizontal axis shows application processes, ordered by MPI rank. The vertical axis shows metric values for each process. Because hpcviewer can generate scatter plots for any node in the Top-down View, these graphs are calling-context sensitive.
To create a graph, first select a scope in the Top-down View; in the Figure, the top-level procedure main is selected. Then, right-click the selected scope to show the associated context menu. (The menu begins with entries labeled ‘Zoom-in’ and ‘Zoom-out.’) At the bottom of the context menu is a list of metrics that hpcviewer can graph. Each metric contains a sub-menu that lists the three different types of graphs hpcviewer can plot:
Note that the viewers have the following notation for the ranks:
Hence, if the ranks are 0.0, 0.1, … 31.0, 31.1 it means MPI process 0 has two threads: thread 0 and thread 1 (similarly with MPI process 31).
Currently, it is only possible to generate scatter plots for metrics directly collected by hpcrun, which excludes derived metrics created within hpcviewer.
hpcviewer also provides a feature to view the metrics of a certain threads (or processes) named Thread View. First, you need to select a thread or a set of threads of interest. To select a single thread, you can click on the dot from the plot graph (see Figure 8). Then click the context menu “Show thread X” to activate the thread view.
To select a group of threads, you need to use the thread selection window by
clicking button from the calling-context view. On the thread selection
window, you need to select the checkbox of the threads of interest. To narrow the
list, you can specify the thread name on the filter part of the window.
Recall that the format of the thread is “process_id . thread_id” (see
Section 6). Hence, to specify just a main thread (thread zero), you can type
’.0’ on the filter, and the view only list threads 0 (such as 1.0, 2.0, 3.0
…).
Once threads have been selected, you can click OK, and the Thread view (Figure 9) will be activated. The tree of the view is the same as the tree from calling context view, with the metrics only from the selected threads. If there are more than one selected threads, the metrics are the sum of the values of the selected threads.
Occasionally, It is useful to omit uninterested nodes of the tree to enable to focus on important parts. For instance, you may want to hide all nodes associated with OpenMP runtime and just show all nodes and metrics from the application. For this purpose, hpcviewer provides filtering to elide nodes that match a filter pattern. hpcviewer allows users to define multiple filters, and each filter is associated with a glob pattern1 and a type. There are three types of filter: “self only” to omit matched nodes, “descendants only” to exclude only the subtree of the matched nodes, and “self and descendants” to remove matched nodes and its descendants.
The result of applying self only filter
on node C. Node C is elided and
its children (nodes D and E) are
augmented to the parent of node C.
The exclusive cost of node C is also
augmented to node A.
The result of applying Descendants
only filter on node C. All the children
of node C (nodes D and E) are elided,
and the total of their exclusive cost is
added to node C.
The result of applying self and
descendants filter on node C. Nodes
C and its descendants are elided, and
their exclusive cost is augmented to
node A which is the parent of node C.
Self only : This filter is useful to hide intermediary runtime functions such as pthread or OpenMP runtime functions. All nodes that match filter patterns will be removed, and their children will be augmented to the parent of the elided nodes. The exclusive cost of the elided nodes will be also augmented into the exclusive cost of the parent of the elided nodes. Figure 10b shows the result of filtering node C of the CCT from Figure 10a. After filtering, node C is elided and its exclusive cost is augmented into the exclusive cost of its parent (node A). The children of node C (nodes D and E) are now the children of node A.
Descendants only : This filter elides only the subtree of the matched node, while the matched node itself is not removed. A common usage of this filter is to exclude any call chains after MPI functions. As shown in Figure 10c, filtering node C incurs nodes D and E to be elided and their exclusive cost is augmented to node C.
Self and descendants : This filter elides both the matched node and its subtree. This type is useful to exclude any unnecessary details such as glibc or malloc functions. Figure 10d shows that filtering node C will elide the node and its children (nodes D and E). The total of the exclusive cost of the elided nodes is augmented to the exclusive cost of node A.
The filter feature can be accessed by clicking the menu “Filter” and then submenu “Show filter property”, which will then show a Filter property window (Figure 12). The window consists of a table of filters, and a group of action buttons: add to create a new filter; edit to modify a selected filter; and delete to remove a set of selected filters.. The table comprises of two columns: the left column is to display a filter’s switch whether the filter is enabled or disabled, and a glob-like filter pattern; and the second column is to show the type of pattern (self only, children only or self and children). If a checkbox is checked, it signifies the filter is enabled; otherwise the filter is disabled.
Cautious is needed when using filter feature since it can change the shape of the tree, thus affects different interpretation of performance analysis. Furthermore, if the filtered nodes are children of a “fake” procedures (such as <program root> and <thread root>), the exclusive metrics in callers view and flat view can be misleading. This occurs since these views do not show “fake” procedures.
In this section we describe some features of hpcviewer that help improve productivity.
The editor pane is used to display a copy of your program’s source code or HPCToolkit’s performance data in XML format; for this reason, it does not support editing of the pane’s contents. To edit your program, you should use your favorite editor to edit your original copy of the source, not the one stored in HPCToolkit’s performance database. Thanks to built-in capabilities in Eclipse, hpcviewer supports some useful shortcuts and customization:
For the metric pane, hpcviewer has some convenient features:
The dialog box contains a list of metric columns sorted according to their order in HPCToolkit’s performance database for the application. Each metric column is prefixed by a check box to indicate if the metric should be displayed (if checked) or hidden (unchecked). To display all metric columns, one can click the Check all button. A click to Uncheck all will hide all the metric columns.
Finally, an option Apply to all views will set the configuration into all views when checked. Otherwise, the configuration will be applied only on the current view.
hpcviewer provides five main menus:
This menu includes several menu items for controlling basic viewer operations.
Currently hpcviewer does not support storing a merged database into a file.
This menu only contains one submenu:
This menu is only visible if at least one database is loaded. All actions in this menu are intended primarily for tool developer use. By default, the menu is hidden. Once you open a database, the menu is then shown.
This menu contains only one submenu to reset the position of the views to the original default position. Since hpcviewer is built on top of Eclipse, sometimes Eclipse fails to reposition its views due to its bugs. A work-around to fix this issue is an ongoing work.
This menu displays information about the viewer. The menu contains two items:
Some important hpcviewer limitations are listed below: