Bokeh is a software package in the Python ecosystem for data science that provides support for interactive visualization of data. As such, it is complementary to other visualization packages that focus more on enabling the generation of static figures for inclusion in publications and reports. Bokeh renders data visualizations within web browsers and Jupyter notebooks, providing capabilities for inspecting different aspects of plotted data, selecting and highlighting subsets of data, setting parameters through graphical user interfaces and direct interactions with plots, and the construction of dashboards that integrate different data streams and analyses based on user interactions.
This case study details how to use various profiling tools to assess computational performance and possible optimization strategies that can improve performance. This walk-through focuses on the Stampede2 KNL nodes at TACC, but the tools and approaches are generally applicable to a variety of advanced computational architectures. The complex internal structure of modern processors makes such profiling and analysis tools especially important, as one is often not able to predict computational performance a priori in the face of so many contributing factors. Analyses discussed here include: Identification of code hotspots Scaling of computational runtimes with number of nodes Identification of performance limitations due to inefficient communication with memory Profiling of vectorization performance Optimizations emphasized here primarily involve the use of various compiler options and small compiler directives that can be added to code, as opposed to large-scale restructuring of algorithms or data structures to improve performance. The material is based on a webcast presentation given by Shiquan Su (NCAR) and Chad Burdyshaw (NICS) on October 17, 2017, as part of the ECSS Symposium Series. This tutorial augments that presentation by integrating video clips from the webcast with a textual summary of key points, along with further explanations and remarks. The latter part of the joint presentation by Burdyshaw, which leverages the use of the Intel performance analysis tools (especially VTune Amplifier XE or "VTune"), is emphasized here, as it should be pertinent to a wide range of application codes. The first part of the presentation by Su focuses more on the scientific and numerical background of the particular code being analyzed. The key take-home messages from Su's presentation are summarized in the Application Context topic (and expanded upon in the Appendix), which serves as a prelude to the more generally applicable topics, Performance Analysis and Vectorization & Parallelization.
When running a job for any substantial length of time, you hope that nothing interrupts it so that you won't have to start over. But no matter the environment, interruption is always a possibility. Checkpoint/Restart (C/R) solutions allow you to resume a job from approximately where it left off. Checkpointing strategies also have additional benefits beyond fault tolerance, so learning about these strategies is likely to benefit your own work. This roadmap provides an overview of the checkpointing/restart strategy and a survey of different types of C/R solutions.
The C programming language is heavily used in the scientific and high performance computing community, and also happens to be the same language in which many operating systems are written. Thus, it is important for scientists and engineers to have a good understanding of how the language can assist them with their computing needs. This document is provided for the beginning programmer who has an interest in learning to effectively use the C language. If you have never programmed before, you can also use this document to learn the basic concepts of programming. However, you may also want to refer to other references as well.
Advanced clusters for High Performance Computing (HPC), such as the Frontera and Stampede2 supercomputers run by the Texas Advanced Computing Center (TACC), are powered by a large number of multi-core processors, abundant memory and cache, and fast interconnects. The Frontera and Stampede2 supercomputers use processors from the Intel Xeon Scalable Processor (SP) product line: Stampede2 is built in part using Intel Skylake processors, and Frontera leverages Intel Cascade Lake chips. This topic presents the salient characteristics of these clusters and of both these processors, which represent the first two generations of the Intel Xeon Scalable Processor product line. Stampede2 also contains nodes based on a third-generation SP processor, Ice Lake, which were added in 2022. While we focus on the earlier Intel Xeon Scalable Processors, much of the material in this topic is generally applicable to other advanced cluster architectures built out of a large number of multi-core nodes, providing information on how to effectively use such hardware and scale applications for larger problem sizes.
There are a few simple things one can do in one's code to make the most of typical computing resources, up to and including high performance computing (HPC) resources. This roadmap covers basic aspects of code optimization that can have a big impact, as well as common performance pitfalls to avoid. The roadmap also explains the main features of microprocessor architecture and how they relate to the performance of compiled code.
Transferring data and code between your workstation and a remote computer is a common part of scientific workflows. Sometimes this data can be quite large, and sometimes you wish to transmit your data securely. And recently, data transfers between cloud storage and computing facilities are becoming increasingly common. There are a number of utilities available to help you accomplish these essential tasks. Your choice of data transfer utility will depend on how much data you are transferring, how you prefer to perform the transfer, and your priorities (including transfer speed, ease of use, security and validation). This topic presents several data transfer options and their pros and cons, as well as ways to make these transfers faster. While the file transfer techniques presented here are useful in many situations, the included examples will use TACC's Stampede2 and Frontera as the remote computers.
Deep learning comprises a set of methods for Machine Learning and Artificial Intelligence, based on the use of multilayer neural networks to carry out learning. Deep learning techniques can identify patterns in data even within large data sets, and often require substantial computational resources for training model parameters and making predictions. The Frontera supercomputer at the Texas Advanced Computing Center (TACC) is built to support large computational workloads such as those involved with deep learning. Software packages such as TensorFlow, Keras, and PyTorch are widely used to build deep learning pipelines.
Fortran is one of the premier languages for numerical analysis and high performance computing. For this reason, Fortran compilers are typically available on all large-scale computing platforms, including Stampede2 and Frontera at TACC. Today, Fortran continues to evolve steadily to fit the needs of the (super)computing community. This introductory roadmap focuses mainly on its elementary features.