Single Application Source, Any Hardware System

Date:

With the increasing volume of data and the availability of various accelerators, it is critical to maximize system utility in heterogeneous systems. The fact that companies such as NVIDIA are acquiring Mellanox, Intel is acquiring Altera, and AMD is acquiring Xilinx demonstrates industry awareness of the importance of maximizing system utilization in heterogeneous architectures by leveraging all available compute power from different processors in addition to traditionally targeted processors for specific workloads. We foresee a hardware-agnostic system in which any application, regardless of its conventional hardware target, can execute on any hardware substrate. We demonstrate this by running CUDA programs that were previously only executed on GPUs on CPU SIMD units alongside GPU cores. The challenge here is to retain programmability by leaving the application source code alone. This mapping should be done transparently to the application developer, with comparable performance to make the effort worthwhile while supporting all degrees of parallelism. This vision also has its own set of memory management difficulties, which we go through in depth. For batched workloads that mirror data center conditions, we achieved 1.5X greater performance than a pure CPU-SIMD setup and 1.3X better performance than a pure GPU setup. This demonstrates the potential for characterizing workloads and routing to the appropriate hardware substrate.