Title |
Case Study: Profiling and Optimization on Advanced Cluster Architectures |
Abstract |
This case study details how to use various profiling tools to assess computational performance and possible optimization strategies that can improve performance. This walk-through focuses on the Stampede2 KNL nodes at TACC, but the tools and approaches are generally applicable to a variety of advanced computational architectures. The complex internal structure of modern processors makes such profiling and analysis tools especially important, as one is often not able to predict computational performance a priori in the face of so many contributing factors. Analyses discussed here include: Identification of code hotspots Scaling of computational runtimes with number of nodes Identification of performance limitations due to inefficient communication with memory Profiling of vectorization performance Optimizations emphasized here primarily involve the use of various compiler options and small compiler directives that can be added to code, as opposed to large-scale restructuring of algorithms or data structures to improve performance. The material is based on a webcast presentation given by Shiquan Su (NCAR) and Chad Burdyshaw (NICS) on October 17, 2017, as part of the ECSS Symposium Series. This tutorial augments that presentation by integrating video clips from the webcast with a textual summary of key points, along with further explanations and remarks. The latter part of the joint presentation by Burdyshaw, which leverages the use of the Intel performance analysis tools (especially VTune Amplifier XE or "VTune"), is emphasized here, as it should be pertinent to a wide range of application codes. The first part of the presentation by Su focuses more on the scientific and numerical background of the particular code being analyzed. The key take-home messages from Su's presentation are summarized in the Application Context topic (and expanded upon in the Appendix), which serves as a prelude to the more generally applicable topics, Performance Analysis and Vectorization & Parallelization. |
Authors |
['Chris Myers, Steve Lantz'] |
Expertise Level |
None |
Learning Outcome |
None |
Learning Resource Type |
asynchronous online training |
Target Group |
['Researchers', 'Research groups', 'Student'] |
Keywords |
['optimization', 'profiling', 'compiler options', 'vectorization', 'roofline model', 'Intel Xeon', 'directives', 'efficiency'] |
Cost |
None |
Duration |
240 |
Language |
en |
License |
None |
Resource URL Type |
URL |
Start Datetime |
None |
URL |
https://cvw.cac.cornell.edu/case-study-opt |
Version Date |
2018-12 |
Provider ID |
urn:ogf.org:glue2:access-ci.org:resource:cider:infrastructure.organizations:898 |
Rating |
None |