Dr. Maciej Besta
ETH Zürich
12 December 2023
Research papers covered during the seminar:
Chains, Trees, and Graphs of Thoughts: Demystifying Structured-Enhanced Prompting
James D. Totter
Simula Research Center, Norway
8 December 2022
The FEniCS project shows that it is possible to create user-friendly interfaces for solving partial differential equations (PDEs) with the finite element method. This is achieved by providing a high-level, domain-specific language that makes it easy to formulate finite element methods, and then using a specialised compiler to automatically generate low-level computational kernels that are needed. This talk describes how we extended the existing automated code generation in FEniCS with runtime compilation of CUDA code, to offload computations to NVIDIA GPUs, without any intervention from the user. Of particular importance is avoiding unnecessary data transfers between host- and GPU memory, which can easily thwart any benefit of GPU acceleration.
Research papers covered during the seminar:
Mehmet Esat Belviranli, Ph.D.
Colorado School of Mines
1 November 2022
In this talk, we investigate a framework that enables resource-constraint aware multi-accelerator execution for diversely heterogeneous SoCs. We achieve this by distributing the layers of a NN inference across different accelerators so that the trade-off between performance and energy satisfies system constraints. We further explore improving total throughput by concurrently using different types of accelerators for executing NNs in parallel. Our proposed methodology uniquely considers inter-accelerator transition costs, shared-memory contention and accelerator architectures that embed internal hardware pipelines. We employ empirical performance models and constraint-based optimization problems to determine optimal multi-accelerator execution schedules.
Research paper covered during the seminar:
Dr. Arjun Chandra
Graphcore, Norway
28 April 2022
Deep learning models are growing in size and complexity. One of the ways to train them efficiently at scale is to use low precision arithmetic and number formats. This talk will cover some of the key techniques engineered at Graphcore to provide numerically stable training of neural networks in reduced precision whilst maintaining the target FP32 accuracy. These techniques are available for use with Graphcore IPUs via our Poplar SDK.
Topic covered during the seminar:
Dr. Anshu S. Ananda
Indian Institute of Information Technology, Allahabad
31 March 2022
In this talk, I will first show how Powerlist, a data structure that enables us to specify parallel algorithms concisely, can be used as an abstraction for parallelism by describing a method to schedule computations (eg. Matrix Multiplication) across a cluster of GPUs. This is realized by implementing Powerlist as a library that facilitates automatic partitioning of the matrices and utilizing the cuBLAS API for efficient matrix multiplication of the sub-matrices at the individual GPUs. In the second part, I will discuss the prospects of Powerlist as a Locality abstraction.
Topic covered during the seminar:
Alexander Geiß, M.Sc.
Technical University of Darmstadt
3 March 2022
The topic of this talk is Extra-P, an automatic performance-modeling tool that supports the user in the identification of scalability bottlenecks. This talk will give an overview of performance modeling with Extra-P. We start with a brief motivation for performance models and the need for assistance in creating them; followed by an explanation of the most important parts of the underlying methods and a discussion of the limitations of the method. Finally, we will discuss the recommended workflow for performance modeling with Extra-P based on a small demonstration.
Research papers covered during the seminar:
Prof. Paul H.J. Kelly
Imperial College London
13 December 2021
The topic of this talk: Domain-specific languages enable us to automate the generation of high-performance code from a high-level abstraction. This talk will show, through a couple of example projects (Firedrake and Devito) that DSLs can deliver productivity, performance, and performance-portability. The key to success is compiler architecture – designing intermediate representations that make optimisations easy and analysis trivial. But the DSL software ecosystem is dysfunctional: DSL compilers (including ours) are typically standalone projects, reliant on support from a narrow developer base. Few, if any, components are shared between DSLs. The talk will conclude with a manifesto for fixing this – building on MLIR to establish community support for code generation tools that underpin multiple front-end DSLs. I will argue that this is in fact the only way we can tackle the complexity involved in achieving high performance for complex applications on diverse hardware.
Research paper covered during the seminar:
Dr. Nehir Sönmez
Barcelona Supercomputing Center
8 November 2021
Research paper covered during the seminar:
Dr. Luc Jaulmes
Barcelona Supercomputing Center
4 October 2021
Research paper covered during the seminar:
Dr. Aleksandar Ilic
4 October 2021
Research papers covered during the seminar:
Dr. Milind Chabbi
Uber Technologies, Inc.
13 July 2021
Research papers covered during the seminar:
Dr. Tan Nguyen
Lawrence Berkeley National Laboratory
21 June 2021
Research papers covered during the seminar:
Dr. Mohamed Wahib
AIST/TokyoTech Open Innovation Laboratory
25 May 2021
Research papers covered during the seminar: