research-article

Open access

Schedule Synthesis for Halide Pipelines through Reuse Analysis

Authors:

Savvas Sioutas,

Henk Corporaal,

Lou SomersAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 16, Issue 2

Article No.: 10, Pages 1 - 22

https://doi.org/10.1145/3310248

Published: 18 April 2019 Publication History

All formats PDF

Abstract

Efficient code generation for image processing applications continues to pose a challenge in a domain where high performance is often necessary to meet real-time constraints. The inherently complex structure found in most image-processing pipelines, the plethora of transformations that can be applied to optimize the performance of an implementation, as well as the interaction of these optimizations with locality, redundant computation and parallelism, can be indentified as the key reasons behind this issue. Recent domain-specific languages (DSL) such as the Halide DSL and compiler attempt to encourage high-level design-space exploration to facilitate the optimization process. We propose a novel optimization strategy that aims to maximize producer-consumer locality by exploiting reuse in image-processing pipelines. We implement our analysis as a tool that can be used alongside the Halide DSL to automatically generate schedules for pipelines implemented in Halide and test it on a variety of benchmarks. Experimental results on three different multi-core architectures show an average performance improvement of 40% over the Halide Auto-Scheduler and 75% over a state-of-the art approach that targets the PolyMage DSL.

References

[1]

Andrew Adams, Eino-Ville Talvala, Sung Hee Park, David E. Jacobs, Boris Ajdin, Natasha Gelfand, Jennifer Dolson, Daniel Vaquero, Jongmin Baek, Marius Tico, Hendrik P. A. Lensch, Wojciech Matusik, Kari Pulli, Mark Horowitz, and Marc Levoy. 2010. The Frankencamera: An experimental platform for computational photography. In ACM SIGGRAPH 2010 Papers (SIGGRAPH’10). ACM, New York, NY, Article 29, 12 pages.

Digital Library

[2]

Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O’Reilly, and Saman Amarasinghe. 2014. OpenTuner: An extensible framework for program autotuning. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT’14). ACM, New York, NY, 303--316.

Digital Library

[3]

Jiawen Chen, Sylvain Paris, and Frédo Durand. 2007. Real-time edge-aware image processing with the bilateral grid. ACM Trans. Graph. 26 (2007), 103.

Digital Library

[4]

D. Cociorva, J. W. Wilkins, C. Lam, G. Baumgartner, J. Ramanujam, and P. Sadayappan. 2001. Loop optimization for a class of memory-constrained computations. In Proceedings of the 15th International Conference on Supercomputing (ICS’01). ACM, New York, NY, 103--113.

Digital Library

[5]

J. Darbon, A. Cunha, T. F. Chan, S. Osher, and G. J. Jensen. 2008. Fast nonlocal filtering applied to electron cryomicroscopy. In Proceedings of the 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro. 1331--1334.

[6]

Zeev Farbman, Raanan Fattal, and Dani Lischinski. 2011. Convolution pyramids. In Proceedings of the 2011 SIGGRAPH Asia Conference (SA’11). ACM, New York, NY.

Digital Library

[7]

Halide. 2018. Halide GitHub Repository (MIT License). Retrieved from https://github.com/halide/Halide (commit 402171e7a4dfacb0bd93297cbdfb600a325fe745).

[8]

Chris Harris and Mike Stephens. 1988. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision Conference. 147--151.

[9]

Justin Holewinski, Louis-Noël Pouchet, and P. Sadayappan. 2012. High-performance code generation for stencil computations on GPU architectures. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS’12). ACM, New York, NY, 311--320.

Digital Library

[10]

Abhinav Jangda and Uday Bondhugula. 2018. An effective fusion and tile size model for optimizing image processing pipelines. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’18). ACM, New York, NY, 261--275.

Digital Library

[11]

Ken Kennedy and Kathryn S. McKinley. 1994. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Languages and Compilers for Parallel Computing, Utpal Banerjee, David Gelernter, Alex Nicolau, and David Padua (Eds.). Springer, Berlin, 301--320.

Digital Library

[12]

J. Kim, J. K. Lee, and K. M. Lee. 2016. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1646--1654.

[13]

Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2007. Effective automatic parallelization of stencil computations. SIGPLAN Not. 42, 6 (Jun. 2007), 235--244.

Digital Library

[14]

Naraig Manjikian and Tarek S. Abdelrahman. 1997. Fusion of loops for parallelism and locality. IEEE Trans. Parallel Distrib. Syst. 8, 2 (Feb. 1997), 193--209.

Digital Library

[15]

Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically scheduling halide image processing pipelines. ACM Trans. Graph. 35, 4, Article 83 (Jul. 2016), 11 pages.

Digital Library

[16]

Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. 2015. PolyMage: Automatic optimization for image processing pipelines. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). ACM, New York, NY, 429--443.

Digital Library

[17]

Catherine Olschanowsky, Michelle Mills Strout, Stephen Guzik, John Loffeld, and Jeffrey Hittinger. 2014. A study on balancing parallelism, data locality, and recomputation in existing PDE solvers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). IEEE Press, Los Alamitos, CA, 793--804.

Digital Library

[18]

Sylvain Paris, Samuel W. Hasinoff, and Jan Kautz. 2011. Local Laplacian filters: Edge-aware image processing with a Laplacian pyramid. In ACM SIGGRAPH 2011 Papers (SIGGRAPH’11). ACM, New York, NY, Article 68, 12 pages.

Digital Library

[19]

Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen, and John Cavazos. 2008. Iterative optimization in the polyhedral model: Part Ii, multidimensional time. SIGPLAN Not. 43, 6 (Jun. 2008), 90--100.

Digital Library

[20]

PolyMage project. 2016. PolyMage Repository (Apache 2.0 License 2016). Retrieved from https://bitbucket.org/udayb/polymage (commit 0ff0b46456605a5579db09c6ef98cb247dd2131d).

[21]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). ACM, New York, NY, 519--530.

Digital Library

[22]

C. Rhemann, A. Hosni, M. Bleyer, C. Rother, and M. Gelautz. 2011. Fast cost-volume filtering for visual correspondence and beyond. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition (CVPR’11). 3017--3024.

Digital Library

[23]

Savvas Sioutas, Sander Stuijk, Henk Corporaal, Twan Basten, and Lou Somers. 2018. Loop transformations leveraging hardware prefetching. In Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO’18). ACM, New York, NY, 254--264.

Digital Library

[24]

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. http://arxiv.org/abs/1802.04730

[25]

Michael E. Wolf and Monica S. Lam. 1991. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation (PLDI’91). ACM, New York, NY, 30--44.

Digital Library

[26]

Jingling Xue. 2005. Aggressive loop fusion for improving locality and parallelism. In Proceedings of the 3rd International Conference on Parallel and Distributed Processing and Applications (ISPA’05). Springer-Verlag, Berlin, 224--238.

Digital Library

[27]

Qing Yi and Ken Kennedy. 2004. Improving memory hierarchy performance through combined loop interchange and multi-level fusion. Int. J. High Perf. Comput. Appl. 18, 2 (2004), 237--253.

Digital Library

[28]

Xing Zhou, Jean-Pierre Giacalone, María Jesús Garzarán, Robert H. Kuhn, Yang Ni, and David Padua. 2012. Hierarchical overlapped tiling. In Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO’12). ACM, New York, NY, 207--218.

Digital Library

Cited By

Kanetaka YTakagi HMaeda YFukushima N(2024)SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for HalideIEEE Access10.1109/ACCESS.2023.334566012(7563-7583)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3345660
Yang YBarnes CAdams AFinkelstein A(2022)AδACM Transactions on Graphics10.1145/3528223.353012541:4(1-24)Online publication date: 22-Jul-2022
https://dl.acm.org/doi/10.1145/3528223.3530125
Singh SHegarty JLeather HSteiner BChaudhuri SSutton C(2022)A graph neural network-based performance model for deep learning applicationsProceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming10.1145/3520312.3534863(11-20)Online publication date: 13-Jun-2022
https://dl.acm.org/doi/10.1145/3520312.3534863
Show More Cited By

Index Terms

Schedule Synthesis for Halide Pipelines through Reuse Analysis
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. Context specific languages
      1. Domain specific languages

Recommendations

Schedule Synthesis for Halide Pipelines on GPUs

The Halide DSL and compiler have enabled high-performance code generation for image processing pipelines targeting heterogeneous architectures through the separation of algorithmic description and optimization schedule. However, automatic schedule ...
Automatically scheduling halide image processing pipelines

The Halide image processing language has proven to be an effective system for authoring high-performance image processing code. Halide programmers need only provide a high-level strategy for mapping an image processing pipeline to a parallel machine (a ...
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
PLDI '13

Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 16, Issue 2

June 2019

317 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3325131

Editor:
Koen De Bosschere
Ghent University, Belgium

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2019

Accepted: 01 January 2019

Revised: 01 December 2018

Received: 01 July 2018

Published in TACO Volume 16, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
1,031
Total Downloads

Downloads (Last 12 months)130
Downloads (Last 6 weeks)22

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kanetaka YTakagi HMaeda YFukushima N(2024)SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for HalideIEEE Access10.1109/ACCESS.2023.334566012(7563-7583)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3345660
Yang YBarnes CAdams AFinkelstein A(2022)AδACM Transactions on Graphics10.1145/3528223.353012541:4(1-24)Online publication date: 22-Jul-2022
https://dl.acm.org/doi/10.1145/3528223.3530125
Singh SHegarty JLeather HSteiner BChaudhuri SSutton C(2022)A graph neural network-based performance model for deep learning applicationsProceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming10.1145/3520312.3534863(11-20)Online publication date: 13-Jun-2022
https://dl.acm.org/doi/10.1145/3520312.3534863
Anderson LAdams AMa KLi TJin TRagan-Kelley J(2021)Efficient automatic scheduling of imaging and vision pipelines for the GPUProceedings of the ACM on Programming Languages10.1145/34854865:OOPSLA(1-28)Online publication date: 20-Oct-2021
https://dl.acm.org/doi/10.1145/3485486
Mara MHeide FZollhöfer MNießner MHanrahan P(2021)Thallo – Scheduling for High-Performance Large-Scale Non-Linear Least-Squares SolversACM Transactions on Graphics10.1145/345398640:5(1-14)Online publication date: 24-Sep-2021
https://dl.acm.org/doi/10.1145/3453986
Sioutas SStuijk SBasten TCorporaal HSomers L(2020)Schedule Synthesis for Halide Pipelines on GPUsACM Transactions on Architecture and Code Optimization10.1145/340611717:3(1-25)Online publication date: 3-Aug-2020
https://dl.acm.org/doi/10.1145/3406117
Adams AMa KAnderson LBaghdadi RLi TGharbi MSteiner BJohnson SFatahalian KDurand FRagan-Kelley J(2019)Learning to optimize halide with tree search and random programsACM Transactions on Graphics10.1145/3306346.332296738:4(1-12)Online publication date: 12-Jul-2019
https://dl.acm.org/doi/10.1145/3306346.3322967

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents