09:00 – 09:20 | ‘Analysing COVID-19 Epidemiological Simulations from the load balance perspective’ by David E. Singh
Epigraph is a parallel agent-based parallel simulator that simulates the propagation of COVID-19 over wide geographical areas, and has been used to support the decision-making of the European Union and Spanish health authorities throughout the COVID-19 crisis. In order to realistically simulate interactions between individuals, Epigraph uses complex social models extracted from social networks that are internally implemented as sparse data structures. This talk provides a performance analysis of EpiGraph from the data-locality perspective. Different load-imbalance situations that can arise upon new infection outbreak are simulated, and practical solutions, including the use of application-level malleability to dynamically increase or reduce the number of processes, are discussed. This work has been developed in the context of ADMIRE project, funded by the European High-Performance Computing Joint Undertaking.
09:20 – 09:40 | ‘Applications’ Perspective on Dealing with Machine Imbalance’ by Anshu Dubey
Machine imbalance puts the focus squarely on the software architecture of application codes. Abstraction can only go so far. Human in the loop, with the help of tools that make it easy to implement the logic and structure of specializations for different machines, is likely to yield best results. I will present such as approach that is implemented for Flash-X, a Multiphysics code, and also being applied to a couple of other projects.
09:40 – 10:00 | ‘Adapting applications to an increasingly heterogeneous hardware landscape: lessons from the Exascale Computing Project’ by Erik W. Draeger
For the past seven years, the U.S. Department of Energy’s Exascale Computing Project (ECP) has funded a comprehensive push to refactor 24 application projects to efficiently utilize exascale computing hardware to solve a varied set of complex science and engineering problems. Ambitious performance and capability goals were set for each application that demanded end-to-end rethinking of traditional approaches. Through detailed performance analysis, integration with optimized co-design frameworks and software libraries, and the use of programming abstractions to manage data placement and kernel execution, ECP applications recently demonstrated substantial capability and performance improvements on newly-available exascale machines. Despite significant diversity in the methods and algorithms underlying the ECP application portfolio, several common themes emerged in how to best adapt computational workloads to heterogeneous architectures. In this talk, an overview of best practices and lessons learned on effectively utilizing heterogeneous exascale hardware from the perspective of ECP applications will be presented. The role of data placement and locality on application performance will be discussed along with the anticipated impact of increased hardware diversity and heterogeneity. Guidance to application developers needing to adapt their codes in preparation for an uncertain architectural landscape will be offered.
10:00 – 10:20 | ‘Unlocking Data Locality in Imbalanced Supercomputers: Unveiling the Energy Perspective’ by Osman Seckin Simsek
Recent top supercomputers, such as LUMI and FRONTIER, shift the paradigm from CPU-centric node designs to GPU-centric node designs, which increases the machine imbalance in each node. In this talk, we present performance and data locality results for astrophysical simulations executing on different architectures, revealing the relation between time and energy consumption of data movements both within nodes and across nodes. We investigate the effects of increased machine imbalance on the energy consumption associated with data movements. For our experiments, we use SPH-EXA, a highly scalable and extendable simulation framework for astrophysical and cosmological simulations.
10:20 – 11:00 | Break
11:00 – 11:40 | PANEL 5 (Moderator: Hatem Ltaief)