Ph.D. Final Oral Exam: Akash Dutta
Speaker:Akash Dutta
Learning-based Auto-tuning for High Performance Computing
With the gradual stagnation of Moore's law, compiler developers and hardware vendors have increasingly parallelized source code to improve performance. Such parallelization efforts have enhanced avenues of performance analysis and improvement. Ubiquitous heterogeneity in modern parallel programming and High-Performance Computing (HPC) also complicates performance analysis. Meticulously designed compiler-driven optimizations often cannot exploit available performance and other runtime-based analysis come with high overheads. It is essential, therefore, to identify techniques that improve performance, while reducing overhead.
This thesis presents five studies with this aim in mind. Each study proposes a novel technique to optimize performance on various metrics, such as time and energy, relevant in HPC landscapes. These studies provide unique insights into how adapting and improving various modeling techniques lead to significant gains in performance. The first four chapters build on state-of-the-art deep learning (DL) techniques to improve performance of parallel kernels/applications. These studies automate the feature generation and extraction process for DL-based performance tuning. The first study aims to identify common patterns in source code to guide performance optimizations. The second and third studies explore new ways of code modeling for reducing runtime and energy consumption of parallel applications. The fourth study aims to further reduce overheads associated with the first three studies and streamlines the code modeling and feature generation for DL-based optimization works. Compiler-driven semantic and structural features along with dynamic runtime features are merged to enhance the selections made by the DL models. Extensive evaluations across a variety of downstream tasks demonstrate that the proposed ideas do substantially help improve results compared to prior state of the arts.
The fifth work helps to address one of the primary shortcomings of the first four chapters and describes a novel online auto-tuner, HHOTuner that can easily and efficiently work with user-defined search spaces. HHOTuner is a nature-inspired auto-tuner that works seamlessly with user-defined configurable search spaces, and improves results while significantly reducing overheads over prior state-of-the-art online auto-tuners.
The techniques developed through these studies outline several unique methods of optimizing performance that help improve configuration selection and resource consumption. Moreover, this thesis is also self-contained and hopes to address its own shortcomings, by covering different varieties of optimization tasks in HPC. This author hopes that the ideas presented herein, could help foster new lines of discussion and research, and broaden the techniques used for performance optimizations in the HPC community.
Committee: Ali Jannesari (Major Professor), Pavan Aduri, Robyn Lutz, Chris Quinn, and Xiaoqiu Huang (Exam Substitute)
Join on Zoom: