Talks and presentations

Single Application Source, Any Hardware System

December 16, 2021

Qualifier Exam Talk, Yale University, Area Exam Talk, New Haven, Connecticut

With growing heterogeneity, programmability without compromising performance is of utmost importance. This was research in investigating avenues to offload work on the underlying hardware - be it CPU or any accelerator - without making any changes to the application source code. i.e. An applcation written traditionally for an accelerator such as GPU in a specific language such as CUDA should be able to execute on a CPU if the data offload costs are not worth offloading an application from CPU to the accelerator without making any changes to the application source code to ease programmability while guaranteeing performance.

CUDA Task Launcher for CPU and GPU

October 07, 2021

Internship Talk, Nvidia Research, Santa Clara, California

This was an end of internship talk. For the summer research, I had investigated paths of enabling executing an unmodified CUDA application as-is on CPU SIMD units as well as on the GPU if the appliction and system demanded it to improve performance. This meant that if kernel offload costs were higher than the computation time then it made no sense to migrate tass to the GPU. However, we managed this in the system software layer transparent to the application developer. We showed performance enhancements of 1.5X compared to an only-CPU execution and 1.3X better to an only-GPU execution of data ceneter workloads.

Opt-Gen : An optimizing self generating optmizer for compilers

June 10, 2015

Undergraduate Thesis Talk, Indian Institute of Technology, Mumbai, University of Pune, Pune, Maharashtra, India

This was the end of thesis talk after the undergraduate research work. The project created an optimizer generated which generated data flow analyses based compiler optimization passes such as “Liveness Analysis”, “Constant Propogation and Folding”, “Available Expression Analysis”, “Anticipable Expression Analysis” and “Live Pointer Analysis” and inserted them in the compiler on the go. The framework also allowed users to provide their own data flow analyses equations and create custom optimization passes.