skip to main content
research-article
Open access

Automatically scheduling halide image processing pipelines

Published: 11 July 2016 Publication History
  • Get Citation Alerts
  • Abstract

    The Halide image processing language has proven to be an effective system for authoring high-performance image processing code. Halide programmers need only provide a high-level strategy for mapping an image processing pipeline to a parallel machine (a schedule), and the Halide compiler carries out the mechanical task of generating platform-specific code that implements the schedule. Unfortunately, designing high-performance schedules for complex image processing pipelines requires substantial knowledge of modern hardware architecture and code-optimization techniques. In this paper we provide an algorithm for automatically generating high-performance schedules for Halide programs. Our solution extends the function bounds analysis already present in the Halide compiler to automatically perform locality and parallelism-enhancing global program transformations typical of those employed by expert Halide developers. The algorithm does not require costly (and often impractical) auto-tuning, and, in seconds, generates schedules for a broad set of image processing benchmarks that are performance-competitive with, and often better than, schedules manually authored by expert Halide developers on server and mobile CPUs, as well as GPUs.

    Supplementary Material

    MP4 File (a83.mp4)

    References

    [1]
    Adams, A., Talvala, E.-V., Park, S. H., Jacobs, D. E., Ajdin, B., Gelfand, N., Dolson, J., Vaquero, D., Baek, J., Tico, M., Lensch, H. P. A., Matusik, W., Pulli, K., Horowitz, M., and Levoy, M. 2010. The frankencamera: An experimental platform for computational photography. ACM Transactions on Graphics 29, 4 (July), 29:1--29:12.
    [2]
    Ansel, J., Kamil, S., Veeramachaneni, K., Ragan-Kelley, J., Bosboom, J., O'Reilly, U.-M., and Amarasinghe, S. 2014. OpenTuner: An extensible framework for program autotuning. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, ACM, 303--316.
    [3]
    Chen, J., Paris, S., and Durand, F. 2007. Real-time edge-aware image processing with the bilateral grid. ACM Transactions on Graphics 26, 3 (July), 103:1--103:9.
    [4]
    Darbon, J., Cunha, A., Chan, T. F., Osher, S., and Jensen, G. J. 2008. Fast nonlocal filtering applied to electron cryomicroscopy. In Biomedical Imaging: From Nano to Macro, 2008. ISBI 2008. 5th IEEE International Symposium on, IEEE, 1331--1334.
    [5]
    Farbman, Z., Fattal, R., and Lischinski, D. 2011. Convolution pyramids. ACM Transactions on Graphics 30, 6 (Dec.), 175:1--175:8.
    [6]
    Harris, C., and Stephens, M. 1988. A combined corner and edge detector. In In Proc. of Fourth Alvey Vision Conference, 147--151.
    [7]
    Hegarty, J., Brunhaver, J., DeVito, Z., Ragan-Kelley, J., Cohen, N., Bell, S., Vasilyev, A., Horowitz, M., and Hanrahan, P. 2014. Darkroom: compiling high-level image processing code into hardware pipelines. ACM Transactions on Graphics 33, 4 (July), 144:1--144:11.
    [8]
    Hegarty, J., Daly, R., DeVito, Z., Ragan-Kelley, J., Horowitz, M., and Hanrahan, P. 2016. Rigel: Flexible multi-rate image processing hardware. ACM Transactions on Graphics 36, 4 (July).
    [9]
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.
    [10]
    Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 1097--1105.
    [11]
    Mullapudi, R. T., Vasista, V., and Bondhugula, U. 2015. PolyMage: Automatic optimization for image processing pipelines. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 429--443.
    [12]
    Paris, S., Hasinoff, S. W., and Kautz, J. 2011. Local Laplacian filters: edge-aware image processing with a Laplacian pyramid. ACM Transactions on Graphics 30, 4 (July), 68:1--68:12.
    [13]
    Ragan-Kelley, J., Adams, A., Paris, S., Levoy, M., Amarasinghe, S., and Durand, F. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics 31, 4 (July), 32:1--32:12.
    [14]
    Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., and Amarasinghe, S. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, 519--530.
    [15]
    Ragan-Kelley, J., Adams, A., and Sharlet, D. 2015. An introduction to Halide. In ACM SIGGRAPH 2015 Courses, ACM, 3:1--3:160.
    [16]
    Rhemann, C., Hosni, A., Bleyer, M., Rother, C., and Gelautz, M. 2011. Fast cost-volume filtering for visual correspondence and beyond. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, 3017--3024.
    [17]
    Simonyan, K., and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

    Cited By

    View all
    • (2024)(De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional HomomorphismsACM Transactions on Programming Languages and Systems10.1145/3665643Online publication date: 22-May-2024
    • (2024)Parallel Pattern Language Code GenerationProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649245(32-41)Online publication date: 3-Mar-2024
    • (2024)DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN AcceleratorsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649196(126-137)Online publication date: 7-May-2024
    • Show More Cited By

    Index Terms

    1. Automatically scheduling halide image processing pipelines

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 35, Issue 4
      July 2016
      1396 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/2897824
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 July 2016
      Published in TOG Volume 35, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. halide
      2. image processing
      3. optimizing compilers

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)288
      • Downloads (Last 6 weeks)29
      Reflects downloads up to 14 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)(De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional HomomorphismsACM Transactions on Programming Languages and Systems10.1145/3665643Online publication date: 22-May-2024
      • (2024)Parallel Pattern Language Code GenerationProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649245(32-41)Online publication date: 3-Mar-2024
      • (2024)DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN AcceleratorsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649196(126-137)Online publication date: 7-May-2024
      • (2024)Compiler-Based Memory Encryption for Machine Learning on Commodity Low-Power DevicesProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641564(198-211)Online publication date: 17-Feb-2024
      • (2024)Optimal Kernel Orchestration for Tensor Programs with KorchProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651383(755-769)Online publication date: 27-Apr-2024
      • (2024)Cocco: Hardware-Mapping Co-Exploration towards Memory Capacity-Communication OptimizationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624865(69-84)Online publication date: 27-Apr-2024
      • (2024)Optimizing Deep Learning Inference via Global Analysis and Tensor ExpressionsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624858(286-301)Online publication date: 27-Apr-2024
      • (2024)Cyclebite: Extracting Task Graphs From Unstructured Compute-ProgramsIEEE Transactions on Computers10.1109/TC.2023.332750473:1(221-234)Online publication date: Jan-2024
      • (2024)Amber: A 16-nm System-on-Chip With a Coarse- Grained Reconfigurable Array for Flexible Acceleration of Dense Linear AlgebraIEEE Journal of Solid-State Circuits10.1109/JSSC.2023.331311659:3(947-959)Online publication date: Mar-2024
      • (2024)Template-Based Automatic Library Function Generation with Halide for Compute-Intensive Simulink Models2024 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)10.1109/COOLCHIPS61292.2024.10531173(1-6)Online publication date: 17-Apr-2024
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media