CUDA Task Launcher for CPU and GPU


This was an end of internship talk. For the summer research, I had investigated paths of enabling executing an unmodified CUDA application as-is on CPU SIMD units as well as on the GPU if the appliction and system demanded it to improve performance. This meant that if kernel offload costs were higher than the computation time then it made no sense to migrate tass to the GPU. However, we managed this in the system software layer transparent to the application developer. We showed performance enhancements of 1.5X compared to an only-CPU execution and 1.3X better to an only-GPU execution of data ceneter workloads.