3:00 pm to 4:00 pm
Event Location: NSH 1507
Bio: Jonathan Ragan-Kelley is a post-doctoral researcher at Stanford, where he works with Pat Hanrahan. He recently finished his PhD in computer science at MIT, where he worked with Frédo Durand and Saman Amarasinghe. His work focuses on high-efficiency computer graphics, at the intersection of graphics with systems, architecture, and compilers. He has worked at all three major GPU vendors in architecture, software, compilers, and research, in research at Adobe, and built a real-time preview system for the special effects industry in collaboration with Industrial Light & Magic and Tippett Studio.
Abstract: Future graphics and imaging applications—from photorealistic real-time rendering, to 4D light field cameras and pervasive sensing, to multi-material 3D printing—demand orders of magnitude more computation than we currently have. The efficiency and performance of an application are determined by the algorithm and the hardware architecture on which it runs, but critically also by the organization of computations and data. Real graphics and imaging applications have complex dependencies, and are limited by locality (the distance over which data has to move, e.g., from nearby caches or far away main memory) and synchronization. Increasingly, the cost of communication—both within a chip and over a network—dominates computation and power consumption, and limits the gains realized from shrinking transistors.
This talk will describe the Halide language and compiler for image processing. Halide explicitly separates what computations define an algorithm from the choices of execution structure which determine parallelism, locality, memory footprint, and synchronization. For image processing algorithms with the same complexity—even the exact same set of arithmetic operations and data—executing on the same hardware, the order and granularity of execution and placement of data can easily change performance by an order of magnitude because of locality and parallelism. I will show that, for data-parallel pipelines common in graphics, imaging, and other data-intensive applications, the organization of computations and data for a given algorithm is constrained by a fundamental tension between parallelism, locality, and redundant computation of shared values. I will present a systematic model of “schedules” which explicitly trade off these pressures by globally reorganizing the computations and data for an entire pipeline, and an optimizing compiler that synthesizes high performance implementations from a Halide algorithm and a schedule. The end result is much simpler programs, delivering performance often many times faster than the best prior hand-tuned C, assembly, and CUDA implementations, while scaling across radically different architectures, from ARM cores to massively parallel GPUs.