Dr. Arjun Chandra
Graphcore, Norway
28 April 2022
Deep learning models are growing in size and complexity. One of the ways to train them efficiently at scale is to use low precision arithmetic and number formats. This talk will cover some of the key techniques engineered at Graphcore to provide numerically stable training of neural networks in reduced precision whilst maintaining the target FP32 accuracy. These techniques are available for use with Graphcore IPUs via our Poplar SDK.
Topic covered during the seminar:
Dr. Anshu S. Ananda
Indian Institute of Information Technology, Allahabad
31 March 2022
In this talk, I will first show how Powerlist, a data structure that enables us to specify parallel algorithms concisely, can be used as an abstraction for parallelism by describing a method to schedule computations (eg. Matrix Multiplication) across a cluster of GPUs. This is realized by implementing Powerlist as a library that facilitates automatic partitioning of the matrices and utilizing the cuBLAS API for efficient matrix multiplication of the sub-matrices at the individual GPUs. In the second part, I will discuss the prospects of Powerlist as a Locality abstraction.
Topic covered during the seminar:
Alexander Geiß, M.Sc.
Technical University of Darmstadt
3 March 2022
The topic of this talk is Extra-P, an automatic performance-modeling tool that supports the user in the identification of scalability bottlenecks. This talk will give an overview of performance modeling with Extra-P. We start with a brief motivation for performance models and the need for assistance in creating them; followed by an explanation of the most important parts of the underlying methods and a discussion of the limitations of the method. Finally, we will discuss the recommended workflow for performance modeling with Extra-P based on a small demonstration.
Research papers covered during the seminar:
Prof. Paul H.J. Kelly
Imperial College London
13 December 2021
The topic of this talk: Domain-specific languages enable us to automate the generation of high-performance code from a high-level abstraction. This talk will show, through a couple of example projects (Firedrake and Devito) that DSLs can deliver productivity, performance, and performance-portability. The key to success is compiler architecture – designing intermediate representations that make optimisations easy and analysis trivial. But the DSL software ecosystem is dysfunctional: DSL compilers (including ours) are typically standalone projects, reliant on support from a narrow developer base. Few, if any, components are shared between DSLs. The talk will conclude with a manifesto for fixing this – building on MLIR to establish community support for code generation tools that underpin multiple front-end DSLs. I will argue that this is in fact the only way we can tackle the complexity involved in achieving high performance for complex applications on diverse hardware.
Research paper covered during the seminar:
Architecture and Performance of Devito, a System for Automated Stencil Computation, ACM TOMS April 2020
Dr. Nehir Sönmez
Barcelona Supercomputing Center
8 November 2021
Research paper covered during the seminar:
A RISC-V Simulator and Benchmark Suite for Designing and Evaluating Vector Architectures, TACO ’17
Dr. Luc Jaulmes
Barcelona Supercomputing Center
4 October 2021
Research paper covered during the seminar:
Dr. Aleksandar Ilic
INESC-ID
4 October 2021
Research papers covered during the seminar:
Dr. Milind Chabbi
Uber Technologies, Inc.
13 July 2021
Research papers covered during the seminar:
Dr. Tan Nguyen
Lawrence Berkeley National Laboratory
21 June 2021
Research papers covered during the seminar:
Dr. Mohamed Wahib
AIST/TokyoTech Open Innovation Laboratory
25 May 2021
Research papers covered during the seminar: