Projects

Performance Optimizations for Machine Learning Applications

Machine learning algorithms successfully address different types of problems in various fields. Since machine learning algorithms consist of complex data structures processed in an iterative fashion, any performance optimizations play a crucial role to reduce their execution time. We develop performance optimizations and performance models for machine learning applications. 

 


Data Placement on Heterogeneous Memory Systems

Heterogeneous memory systems are equipped with two or more types of memories, which work in tandem to complement the capabilities of each other. We study various data placement scheme to assist the programmer in making decisions about program object allocations on heterogeneous memory systems.

 


TiDA and TiDA-acc: Tiling Abstraction for Data Arrays for CPU and GPU

 
TiDA is a programming abstraction that centralizes tiling information within array data types with minimal changes to the source code. The metadata about the data layout can be used by the compiler and runtime to automatically manage parallelism and optimize data locality. TiDA targets NUMA and coherence domain issues on the massively parallel multicore chips.

Collaborators: Tan Nguyen and John Shalf at Berkeley Lab

 


Asynchronous Runtime System for AMR

 

Perilla is a data-driven task graph-based runtime system that exploits the meta-data information extended from the AMRex AMR framework and TiDA tiling library. Perilla utilizes meta-data of AMRex to enable various optimizations at the communication layer facilitating programmers to achieve significant performance improvements with only a modest amount of programming effort.

Collaborators: Tan Nguyen and John Shalf at Berkeley Lab

 


Prior Projects

EmbedSanitizer: Runtime Race Detection for 32-bit Embedded ARM

 

EmbedSanitizer is a tool for detecting concurrency data races in 32-bit ARM-based multithreaded C/C++ applications. We motivate the idea of detecting data races in embedded systems software natively; without virtualization or emulation or use of alternative architecture. This provides more precise results and increased throughput and hence enhanced developer productivity.

More information: https://github.com/hassansalehe/embedsanitizer

Contributors: Hassan Salehe Matar, Didem Unat, Serdar Tasiran


Scalable 3D Front Tracking Method

 

Front tracking is an Eulerian-Lagrangian method for simulation of multiphase flows. The method is known for its accurate calculation of interfacial physics and conservation of mass. Parallelization of front tracking method is challenging because two types (structured and unstructured) of grids need to be handled at the same time. Scalable 3d front tracking method is implemented to optimize different types of communication that arises with parallel implementation of the method.

Collaborators: Metin Muradoğlu and Daulet Izbassarov at Koç University

 


 

ExaSAT: A Performance Modeling Framework for ExaScale Co-design

 

ExaSAT is a comprehensive modeling framework to qualitatively assess the sensitivity of exascale applications to different hardware resources. It can statically analyze an application and gather key characteristics about the computation, communication, data access patterns and data locality. The framewore explores design trade-offs, and extrapolate application requirements to potential hardware realizations in the exascale timeframe (2020). Finally, ExaSAT forms the groundwork for more detailed studies involving architectural simulations of different system design points. Project webiste: http://www.codexhpc.org/?p=98

Collaborators: Cy Chan, John Shalf and John Bell at Berkeley Lab


Mint Programming Model for GPUs

Mint is a domain-specific programming model and translator that generates highly optimized CUDA C from annotated C source. Mint includes an optimizer that targets 3D stencil methods. The translator generates both host and device code and handles the memory management details including host-device and shared memory optimizations. Mint parallelizes loop-nests in appropriately annotated C source, performing domain-specific optimizations important in three-dimensional problems. For more info, visit Mint website, read our paper and thesis.

Collaborators: Scott Baden at Univ. of California, San Diego and Xing Cai at Simula Research Lab