14:00 – 14:20 | ‘Snoopie: Multi-GPU Communication Monitoring Tool’ by Aditya Sasongko (Koç University)
As data movement becomes increasingly costly and limiting within the realm of computing, the necessity for profiling tools focused on communication has grown significantly, particularly for the expansion of multi-GPU applications. While current profiling tools, including those provided by GPU manufacturers, are robust in capturing computational operations within individual GPUs, they lack in effectively overseeing the data transfer activities between pairs of devices and the communication calls made by the NCCL and NVSHMEM libraries.
To address these shortcomings, we introduce a multi-GPU communication profiling tool founded on NVBit and relying on instrumentation. Our tool excels in tracking both peer-to-peer transfers and communication library calls. It offers the capability to attribute these communication instances to the specific lines of source code and the associated data elements. The tool provides a variety of visualization modes and levels of detail, ranging from a broad overview of data movement across the system to the precise instructions and memory addresses involved.
14:20 – 14:40 | ‘Tackling the imbalance between computation and I/O’ by Felix Wolf (Technical University of Darmstadt)
Data-intensive applications spend a significant portion of their execution time on I/O operations such as reading input, writing output, or checkpointing intermediate results. This means that any speedups achieved by accelerating computation are limited by the fraction of time spent on I/O. Conversely, to achieve good performance, data-intensive applications depend to a higher degree than compute-intensive ones on the available I/O bandwidth, which is usually shared with other, potentially data-intensive applications, leading to a situation where inter-application interference can significantly impact performance. In this talk, we will review this problem in more detail and propose a novel scheduling algorithm that does not require the explicit scheduling of I/O bandwidth.
14:40 – 15:00 | ‘SuperTwin: A Digital Twin for HPC Machines’ by Kamer Kaya (Sabancı University)
In this talk, we will introduce SuperTwin – a performance visualization and profiling tool that collects data from the PMUs to detect and understand performance bottlenecks for HPC applications.
15:00 – 15:20 | ‘Data Movement in Climate Adaptation Digital Twins’ by Christopher Haine (Hewlett Packard Enterprise)
The rise of Digital Twins to help design and test disruptive technology spawns virtual laboratories where data streams between real life measurements and computational models. Such considerable influx of data exacerbates the well-known High Performance Computing problem of dealing with data-intensive workloads. Deficiencies of systems software appear at multiple levels of the stack, be it as high as the workflow management level in orchestrating and coupling applications, or as low as the memory hierarchy level where memory and storage technologies converge carrying incompatible software in managing data. To tie Digital Twin workflows together, we present Maestro, a data- and memory-aware middleware designed to consistently address data movement over the complexity of the software stack and the memory hierarchy.
15:20 – 16:00 | Break
16:00 – 16:20 | ‘Abstactions and tools to manage heterogenous memory’ by Emmanuel Jeannot (Inria)
In this talk we will discuss and present our recent work on programming models and abstraction to manage heterogeneous memory (i.e. systems with with fast and small memory and with slower and large memory). We will present pur workflow that consists in analyzing the application, providing hints to and allocating data to the right kind of memory.
16:20 – 17:20 | PANEL 4 (Moderator: Didem Unat)