Bokeh is a software package in the Python ecosystem for data science that provides support for interactive visualization of data. As such, it is complementary to other visualization packages that focus more on enabling the generation of static figures for inclusion in publications and reports. Bokeh renders data visualizations within web browsers and Jupyter notebooks, providing capabilities for inspecting different aspects of plotted data, selecting and highlighting subsets of data, setting parameters through graphical user interfaces and direct interactions with plots, and the construction of dashboards that integrate different data streams and analyses based on user interactions.
This case study details how to use various profiling tools to assess computational performance and possible optimization strategies that can improve performance. This walk-through focuses on the Stampede2 KNL nodes at TACC, but the tools and approaches are generally applicable to a variety of advanced computational architectures. The complex internal structure of modern processors makes such profiling and analysis tools especially important, as one is often not able to predict computational performance a priori in the face of so many contributing factors. Analyses discussed here include: Identification of code hotspots Scaling of computational runtimes with number of nodes Identification of performance limitations due to inefficient communication with memory Profiling of vectorization performance Optimizations emphasized here primarily involve the use of various compiler options and small compiler directives that can be added to code, as opposed to large-scale restructuring of algorithms or data structures to improve performance. The material is based on a webcast presentation given by Shiquan Su (NCAR) and Chad Burdyshaw (NICS) on October 17, 2017, as part of the ECSS Symposium Series. This tutorial augments that presentation by integrating video clips from the webcast with a textual summary of key points, along with further explanations and remarks. The latter part of the joint presentation by Burdyshaw, which leverages the use of the Intel performance analysis tools (especially VTune Amplifier XE or "VTune"), is emphasized here, as it should be pertinent to a wide range of application codes. The first part of the presentation by Su focuses more on the scientific and numerical background of the particular code being analyzed. The key take-home messages from Su's presentation are summarized in the Application Context topic (and expanded upon in the Appendix), which serves as a prelude to the more generally applicable topics, Performance Analysis and Vectorization & Parallelization.
When running a job for any substantial length of time, you hope that nothing interrupts it so that you won't have to start over. But no matter the environment, interruption is always a possibility. Checkpoint/Restart (C/R) solutions allow you to resume a job from approximately where it left off. Checkpointing strategies also have additional benefits beyond fault tolerance, so learning about these strategies is likely to benefit your own work. This roadmap provides an overview of the checkpointing/restart strategy and a survey of different types of C/R solutions.
The C programming language is heavily used in the scientific and high performance computing community, and also happens to be the same language in which many operating systems are written. Thus, it is important for scientists and engineers to have a good understanding of how the language can assist them with their computing needs. This document is provided for the beginning programmer who has an interest in learning to effectively use the C language. If you have never programmed before, you can also use this document to learn the basic concepts of programming. However, you may also want to refer to other references as well.
Advanced clusters for High Performance Computing (HPC), such as the Frontera and Stampede2 supercomputers run by the Texas Advanced Computing Center (TACC), are powered by a large number of multi-core processors, abundant memory and cache, and fast interconnects. The Frontera and Stampede2 supercomputers use processors from the Intel Xeon Scalable Processor (SP) product line: Stampede2 is built in part using Intel Skylake processors, and Frontera leverages Intel Cascade Lake chips. This topic presents the salient characteristics of these clusters and of both these processors, which represent the first two generations of the Intel Xeon Scalable Processor product line. Stampede2 also contains nodes based on a third-generation SP processor, Ice Lake, which were added in 2022. While we focus on the earlier Intel Xeon Scalable Processors, much of the material in this topic is generally applicable to other advanced cluster architectures built out of a large number of multi-core nodes, providing information on how to effectively use such hardware and scale applications for larger problem sizes.
There are a few simple things one can do in one's code to make the most of typical computing resources, up to and including high performance computing (HPC) resources. This roadmap covers basic aspects of code optimization that can have a big impact, as well as common performance pitfalls to avoid. The roadmap also explains the main features of microprocessor architecture and how they relate to the performance of compiled code.
This roadmap provides a brief introduction to R. It covers the basic syntax of the language and illustrates some of its data handling and statistical capabilities. The focus will be on how to run R in various environments, particularly on TACC's Stampede3 and Frontera supercomputers, and especially how to run R in parallel. It is by no means a comprehensive discussion of R. Instead refer to the various manuals available on the R website, or one of the many books on R.
Relational databases are a commonly-used and powerful way of storing data to allow robust querying and maintaining consistency of the stored data; sometimes called "SQL databases", they are queried using the Structured Query Language (SQL) and SQL can also be used to create, populate and alter the database. Relational database management systems (RDBMSs) and their SQL flavours have much in common across the different distributions; although advanced features may require consulting the documentation specific to that RDBMS, most operations can be done with standard SQL
How do you optimize the overall performance of a big, computationally intensive code? On an HPC cluster like Frontera at TACC, with 8,570 nodes and nearly half a million cores of various kinds, the most beneficial optimizations are likely to be those that improve a code's scalability. Such optimizations allow your code to run on more and more processors. With that goal in mind, we start with grand design principles and strategies, then proceed to a discussion of software interfaces, the network interconnect, and even processor microarchitectures. This content should guide you towards the right level to concentrate your efforts, and give you some ideas about what to do to make your code more efficient.