Title: Tapir: Embedding Recursive Fork-Join Parallelism into LLVM's Intermediate Representation

Abstract: Tapir is a compiler intermediate representation (IR) that embeds recursive fork-join parallelism, as supported by task-parallel programming platforms such as Cilk and OpenMP, into a mainstream compiler's IR. Mainstream compilers typically treat parallel linguistic constructs as syntactic sugar for function calls into a parallel runtime. These calls prevent the compiler from performing optimizations on and across parallel control constructs. Remedying this situation has generally been thought to require an extensive reworking of compiler analyses and code transformations to handle parallel semantics. Tapir leverages the "serial-projection property," which is commonly satisfied by task-parallel programs, to handle the semantics of these programs without an extensive rework of the compiler.

For recursive fork-join programs that satisfy the serial-projection property, Tapir enables effective compiler optimization of parallel programs with only minor changes to existing compiler analyses and code transformations. Tapir uses the serial-projection property to order logically parallel fine-grained tasks in the program's control-flow graph. This ordered representation of parallel tasks allows the compiler to optimize parallel codes effectively with only minor modifications. For example, to implement Tapir/LLVM, a prototype of Tapir in the LLVM compiler, we added or modified less than 3000 lines of LLVM's half-million-line core middle-end functionality.

These changes suffice to enable LLVM's existing compiler optimizations for serial code to work with parallel control constructs such as parallel loops and Cilk's cilk_spawn keyword. Tapir also supports parallel optimizations, such as loop scheduling and loop stripmining, that restructure the parallel control flow of the program. By making use of existing LLVM optimizations and new parallel optimizations, Tapir/LLVM can optimize recursive fork-join programs more effectively than traditional compilation methods. On a suite of 35 Cilk application benchmarks, Tapir/LLVM produces more efficient executables for 30 benchmarks, with faster 18-core running times for 26 of them, compared to a nearly identical compiler that compiles parallel linguistic constructs the traditional way. In addition, by integrating Tapir/LLVM into the Accelerated Linear Algebra (XLA) compiler in Google's TensorFlow machine-learning framework, Tapir/LLVM enables more effective compiler optimization of a variety of neural networks written in TensorFlow, improving the parallel performance of these networks by a geometric-mean multiplicative factor of 30%-100% across a variety of CPU hardware architectures.

Bio: Tao B. Schardl is a Research Scientist in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). His research combines algorithms and systems to develop technologies that support principled, scientific approaches to writing fast code. He has previously worked on parallel programming models, theories of performance, diagnostic tools, and compilers to simplify the task of software performance engineering. His work on the Tapir/LLVM compiler earned the best paper award at the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming in 2017. Dr. Schardl received his S.B. in Computer Science and Electrical Engineering from MIT in 2009, his M.Eng. in Computer Science and Electrical Engineering from MIT in 2010, and his Ph.D. in Computer Science and Engineering from MIT in 2016.