HPCToolkit and Related Publications
Selected Overview Paper
- [1]
- Laksono
Adhianto, Sinchan Banerjee, Mike Fagan,
Mark Krentel, Gabriel Marin, John
Mellor-Crummey, and Nathan R. Tallent.
HPCToolkit: Tools for performance analysis of optimized parallel programs.
Concurrency and Computation: Practice and Experience,
22(6):685–701, 2010.
(PDF)
(doi:10.1002/cpe.1553)
HPCToolkit Papers
- [1]
- Xiaozhu
Meng, Jonathon M. Anderson, John
Mellor-Crummey, Mark W. Krentel, Barton P.
Miller, and Srdan Milakovic.
Parallel binary code
analysis.
In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming, PPoPP '21, page 76–89, New York,
NY, USA, 2021. Association for Computing Machinery.
(doi:10.1145/3437801.3441604)
- [2]
- Keren Zhou,
Xiaozhu Meng, Ryuichi Sai, and
John Mellor-Crummey.
GPA: A GPU performance
advisor based on instruction sampling.
In 2021 IEEE/ACM International Symposium on Code Generation and
Optimization (CGO), pages 115–125, 2021.
(doi:10.1109/CGO51591.2021.9370339)
- [3]
- K. Zhou,
Y. Hao, J. Mellor-Crummey,
X. Meng, and X. Liu.
GVPROF:
A value profiler for GPU-based clusters.
In 2020 SC20: International Conference for High Performance Computing,
Networking, Storage and Analysis (SC), pages 1–16, Los Alamitos, CA,
USA, nov 2020. IEEE Computer Society.
(doi:10.1109/SC41405.2020.00093)
- [4]
- Ryuichi Sai,
John Mellor-Crummey, Xiaozhu Meng,
Mauricio Araya-Polo, and Jie Meng.
Accelerating high-order stencils on gpus.
In 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of
High Performance Computer Systems (PMBS), pages 86–108, 2020.
(doi:10.1109/PMBS51919.2020.00014)
- [5]
- Keren
Zhou, Mark W. Krentel, and John
Mellor-Crummey.
Tools for top-down
performance analysis of GPU-accelerated applications.
In Proceedings of the 34th ACM International Conference on
Supercomputing, ICS '20, New York, NY, USA, 2020. Association for
Computing Machinery.
(doi:10.1145/3392717.3392752)
- [6]
- Keren
Zhou, Mark Krentel, and John Mellor-Crummey.
A tool for top-down
performance analysis of GPU-accelerated applications.
In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming, PPoPP '20, page 415–416, New York,
NY, USA, 2020. Association for Computing Machinery.
(doi:10.1145/3332466.3374534)
- [7]
- Lai Wei
and John Mellor-Crummey.
Using sample-based time
series data for automated diagnosis of scalability losses in parallel
programs.
In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming, PPoPP '20, pages 144–159, New York,
NY, USA, 2020. Association for Computing Machinery.
(doi:10.1145/3332466.3374538)
- [8]
- P. Taffet and
J. Mellor-Crummey.
Lightweight, packet-centric monitoring of network traffic and congestion
implemented in p4.
In 2019 IEEE Symposium on High-Performance Interconnects (HOTI),
pages 54–58, 2019.
(doi:10.1109/HOTI.2019.00026)
- [9]
- Philip Taffet and
John Mellor-Crummey.
Understanding congestion in
high performance interconnection networks using sampling.
In Proceedings of the International Conference for High Performance
Computing, Networking, Storage and Analysis, SC '19, New York, NY,
USA, 2019. Association for Computing Machinery.
(doi:10.1145/3295500.3356168)
- [10]
- John
Mellor-Crummey.
Piper: Performance insight for programmers and exascale runtimes: Guiding the
development of the exascale software stack.
10 2017.
(doi:10.2172/1400393)
- [11]
- Milind Chabbi and
John Mellor-Crummey.
Contention-conscious,
locality-preserving locks.
In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming, PPoPP '16, New York, NY, USA, 2016.
Association for Computing Machinery.
(doi:10.1145/2851141.2851166)
- [12]
- Chaoran Yang and
John Mellor-Crummey.
A practical solution to the
cactus stack problem.
In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and
Architectures, SPAA '16, page 61–70, New York, NY, USA, 2016.
Association for Computing Machinery.
(doi:10.1145/2935764.2935787)
- [13]
- Philip
Taffet and Laksono Adhianto.
Addressing challenges in visualizing huge call-path traces.
In 2016 45th International Conference on Parallel Processing Workshops
(ICPPW), pages 319–328, Aug 2016.
(doi:10.1109/ICPPW.2016.53)
- [14]
- Xu Liu
and John Mellor-Crummey.
A tool to analyze the
performance of multithreaded programs on numa architectures.
In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming, PPoPP '14, pages 259–272, New York,
NY, USA, 2014. ACM.
(doi:10.1145/2555243.2555271)
- [15]
- Xu Liu
and John Mellor-Crummey.
A tool to analyze the
performance of multithreaded programs on numa architectures.
SIGPLAN Not., 49(8):259–272, February 2014.
(doi:10.1145/2692916.2555271)
- [16]
- Xu Liu
and John Mellor-Crummey.
A data-centric profiler
for parallel programs.
In Proceedings of the International Conference on High Performance
Computing, Networking, Storage and Analysis, SC '13, pages
28:1–28:12, New York, NY, USA, 2013. ACM.
(doi:10.1145/2503210.2503297)
- [17]
- Milind
Chabbi, Karthik Murthy, Mike Fagan, and
John Mellor-Crummey.
Critically missing pieces on accelerators: A performance tools perspective.
SC '13: Birds of a Feather Session: Critically Missing Pieces in Heterogeneous
Accelerator Computing, Pavan Balaji (Organizer), November 2013.
- [18]
- Nathan R.
Tallent, John M. Mellor-Crummey, Michael
Franco, Reed Landrum, and Laksono Adhianto.
Scalable fine-grained call path tracing.
In ICS '11: Proc. of the 25th International Conference on
Supercomputing, pages 63–74, New York, NY, USA, 2011. ACM.
(PDF)
(doi:10.1145/1995896.1995908)
- [19]
- Xu Liu
and John Mellor-Crummey.
Pinpointing data locality problems using data-centric analysis.
In CGO '11: Proc. of the 2011 IEEE/ACM International Symposium on Code
Generation and Optimization, pages 171–180, 2011.
(doi:10.1109/CGO.2011.5764685)
- [20]
- Nathan R.
Tallent, Laksono Adhianto, and John M.
Mellor-Crummey.
Scalable identification of load imbalance in parallel executions using call
path profiles.
In SC '10: Proc. of the 2010 ACM/IEEE Conference on
Supercomputing, pages 1–11, Washington, DC, USA, 2010. IEEE Computer
Society.
(PDF)
(doi:10.1109/SC.2010.47)
- [21]
- Laksono
Adhianto, John Mellor-Crummey, and Nathan R.
Tallent.
Effectively presenting call path profiles of application performance.
In PSTI 2010: Workshop on Parallel Software Tools and Tool
Infrastructures, in conjunction with the 2010 International Conference on
Parallel Processing, 2010.
(PDF)
- [22]
- Laksono
Adhianto, Sinchan Banerjee, Mike Fagan,
Mark Krentel, Gabriel Marin, John
Mellor-Crummey, and Nathan R. Tallent.
HPCToolkit: Tools for performance analysis of optimized parallel programs.
Concurrency and Computation: Practice and Experience,
22(6):685–701, 2010.
(PDF)
(doi:10.1002/cpe.1553)
- [23]
- Nathan R.
Tallent, John M. Mellor-Crummey, and Allan
Porterfield.
Analyzing lock contention in multithreaded applications.
In PPoPP '10: Proc. of the 15th ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming, pages 269–280, New York, NY, USA,
2010. ACM.
(PDF)
(doi:10.1145/1693453.1693489)
- [24]
- Nathan R.
Tallent and John M. Mellor-Crummey.
Identifying performance bottlenecks in work-stealing computations.
Computer, 42(12):44–50, 2009.
(doi:10.1109/MC.2009.396)
- [25]
- Nathan R.
Tallent, John M. Mellor-Crummey, Laksono
Adhianto, Michael W. Fagan, and Mark Krentel.
Diagnosing performance bottlenecks in emerging petascale applications.
In SC '09: Proc. of the 2009 ACM/IEEE Conference on
Supercomputing, pages 1–11, New York, NY, USA, 2009. ACM.
(PDF)
(doi:10.1145/1654059.1654111)
- [26]
- Nathan R.
Tallent, John Mellor-Crummey, and Michael W.
Fagan.
Binary analysis for measurement and attribution of program performance.
In PLDI '09: Proc. of the 2009 ACM SIGPLAN Conference on Programming
Language Design and Implementation, pages 441–452, New York, NY, USA,
2009. ACM.
Distinguished Paper.
(PDF)
(doi:10.1145/1542476.1542526)
- [27]
- Robert
Fowler, Laksono Adhianto, Bronis de Supinski,
Michael Fagan, Todd Gamblin, Mark
Krentel, John Mellor-Crummey, Martin Schulz,
and Nathan Tallent.
Frontiers of performance
analysis on leadership-class systems.
Journal of Physics: Conference Series, 180:012041 (6pp),
2009.
- [28]
- Nathan R.
Tallent and John Mellor-Crummey.
Effective performance measurement and analysis of multithreaded applications.
In PPoPP '09: Proc. of the 14th ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming, pages 229–240, New York, NY, USA,
2009. ACM.
(PDF)
(doi:10.1145/1504176.1504210)
- [29]
- L. Adhianto,
M. Fagan, M. Krentel, G. Marin,
J. Mellor-Crummey, and N. R. Tallent.
HPCToolkit: Performance measurement and analysis for supercomputers with
node-level parallelism.
In Workshop on Node Level Parallelism for Large Scale Supercomputers, in
conjunction with Supercomputing 2008, November 2008.
- [30]
- Nathan
Tallent, John Mellor-Crummey, Laksono
Adhianto, Mike Fagan, and Mark Krentel.
HPCToolkit: Performance
tools for scientific computing.
Journal of Physics: Conference Series, 125:012088 (5pp),
2008.
- [31]
- John
Mellor-Crummey and Nathan R. Tallent.
A methodology for accurate, effective and scalable performance analysis of
application programs.
In Workshop on Tools, Infrastructures and Methodologies for the
Evaluation of Research Systems, in conjunction with the 2008 IEEE
International Symposium on Performance Analysis of Systems and
Software, pages 4–11, February 2008.
- [32]
- John
Mellor-Crummey, Nathan R. Tallent, Mike
Fagan, and Jan Odegard.
Application performance profiling on the Cray XD1 using HPCToolkit.
In Proc. of the Cray User's Group, May 2007.
- [33]
- Cristian
Coarfa, John Mellor-Crummey, Nathan Froyd, and
Yuri Dotsenko.
Scalability analysis of SPMD codes using expectations.
In ICS '07: Proc. of the 21st International Conference on
Supercomputing, pages 13–22, New York, NY, USA, 2007. ACM.
(doi:10.1145/1274971.1274976)
- [34]
- Nathan
Froyd, Nathan Tallent, John Mellor-Crummey,
and Robert Fowler.
Call path profiling for unmodified, optimized binaries.
In GCC Summit '06: Proc. of the GCC Developers' Summit, 2006,
pages 21–36, 2006.
- [35]
- Nathan
Froyd, John Mellor-Crummey, and Rob Fowler.
Low-overhead call path profiling of unmodified, optimized code.
In Proc. of the 19th International Conference on Supercomputing,
pages 81–90, New York, NY, USA, 2005. ACM.
(PDF)
(doi:10.1145/1088149.1088161)
- [36]
- John
Mellor-Crummey, Robert Fowler, Gabriel
Marin, and Nathan Tallent.
HPCView: A tool for top-down analysis of node performance.
The Journal of Supercomputing, 23(1):81–104, 2002.
(PDF)
(doi:10.1023/A:1015789220266)
- [37]
- John
Mellor-Crummey, Robert Fowler, and David
Whalley.
Tools for application-oriented performance tuning.
In ICS '01: Proc. of the 15th International Conference on
Supercomputing, pages 154–165, New York, NY, USA, 2001. ACM.
(PDF)
(doi:10.1145/377792.377826)
HPCToolkit Talks and Posters
- [1]
- Milind
Chabbi, Karthik Murthy, Mike Fagan, and
John Mellor-Crummey.
Critically missing pieces on accelerators: A performance tools perspective.
SC '13: Birds of a Feather Session: Critically Missing Pieces in Heterogeneous
Accelerator Computing, Pavan Balaji (Organizer), November 2013.
- [2]
- John
Mellor-Crummey.
Hpctoolkit: Sampling-based performance tools for leadership computing.
Productivity Tools for Leadership Science Workshop, Argonne Leadership
Computing Facility Winter Workshop Series, January 2011.
- [3]
- Nathan R.
Tallent.
Performance analysis for parallel programs: From multicore to petascale.
Supercomputing 2010 George Michael HPC Fellow Presentation, November
2010.
- [4]
- John
Mellor-Crummey.
Gaining insight into parallel program performance using sampling.
IBM T. J. Watson Research Center, October 2010.
- [5]
- John
Mellor-Crummey.
A slice of CScADS: Performance tools for petascale platforms.
SciDAC 2010, July 2010.
- [6]
- Nathan R.
Tallent.
Identifying
scalability bottlenecks in large-scale parallel programs using
HPCToolkit.
In Jesus Labarta, Barton P. Miller,
Bernd Mohr, and Martin Schulz, editors,
Program Development for Extreme-Scale Computing, number 10181 in
Dagstuhl Seminar Proceedings, Dagstuhl, Germany, 2010. Schloss Dagstuhl -
Leibniz-Zentrum fuer Informatik, Germany.
- [7]
- John
Mellor-Crummey.
Hpctoolkit: Sampling-based performance tools for leadership computing.
INCITE Getting Started Workshop, Argonne Leadership Computing Facility, January
2010.
- [8]
- Nathan R.
Tallent.
Performance analysis of parallel programs: From multicore to petascale.
Supercomputing 2009 Doctoral Research Showcase, November 2009.
- [9]
- L. Adhianto,
S. Banerjee, M. Fagan,
M. Krentel, G. Marin,
J. Mellor-Crummey, and N. R. Tallent.
HPCToolkit: Performance tools for scientific computing.
In SC '08: Proc. of the 2008 ACM/IEEE Conference on
Supercomputing, New York, NY, USA, 2008. ACM.
- [10]
- John
Mellor-Crummey, Robert Fowler, and Nathan R.
Tallent.
Practical application performance analysis on Linux systems.
Supercomputing 2004 Tutorial, November 2004.
- [11]
- John
Mellor-Crummey.
HPCToolkit: Multi-platform tools for profile-based performance analysis.
5th International Workshop on Automatic Performance Analysis (APART), November
2003.
(PDF)
- [12]
- Nathan
Froyd, John Mellor-Crummey, and Nathan R.
Tallent.
A sample-driven call stack profiler.
4th Symposium of the Los Alamos Computer Science Institute (LACSI 2003),
October 2003.
- [13]
- Nathan R.
Tallent.
HPCToolkit: Top-down analysis of node performance.
2003 MCS Divisional Seminars and Colloquia, Argonne National Laboratory, August
2003.
- [14]
- John
Mellor-Crummey, Robert Fowler, and David
Whalley.
On providing useful information for analyzing and tuning applications.
In SIGMETRICS '01: Proc. of the 2001 ACM SIGMETRICS International
Conference on Measurement and Modeling of Computer Systems, pages
332–333, New York, NY, USA, 2001. ACM.
(PDF)
(doi:10.1145/378420.378828)
Performance Modeling and Prediction Papers
- [1]
- G. Marin
and J. Mellor-Crummey.
Application insight through performance modeling.
In IPCCC 2007: Proc. of the 26th IEEE International Performance,
Computing, and Communications Conference, pages 65 –74, apr. 2007.
(PDF)
(doi:10.1109/PCCC.2007.358880)
- [2]
- Apan
Qasem, Ken Kennedy, and John Mellor-Crummey.
Automatic tuning of whole applications using direct search and a
performance-based transformation system.
J. Supercomput., 36(2):183–196, 2006.
(PDF)
(doi:10.1007/s11227-006-7957-2)
- [3]
- Gabriel
Marin and John Mellor-Crummey.
Scalable cross-architecture predictions of memory hierarchy response for
scientific applications.
In Proc. of the Sixth Annual Los Alamos Computer Science Institute
Symposium, 2005.
(PDF)
- [4]
- Gabriel
Marin and John Mellor-Crummey.
Cross-architecture performance predictions for scientific applications using
parameterized models.
In SIGMETRICS '04: Proc. of the Joint International Conference on
Measurement and Modeling of Computer Systems, pages 2–13, New York,
NY, USA, 2004. ACM.
(PDF)
(doi:10.1145/1005686.1005691)
[Made with bib2xhtml.]
[Page last updated: 2011/06/25]