skip to main content
10.1145/3460945.3464953acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Pure tensor program rewriting via access patterns (representation pearl)

Published: 20 June 2021 Publication History

Abstract

Tensor kernels in machine learning (ML) often correspond to pure mathematical expressions, making term rewriting an attractive strategy for optimization and mapping to specialized hardware accelerators. However, existing ML intermediate representations (IRs) tend to either be pure but high-level, making low-level rewrites to hardware targets inexpressible, or low-level but impure, hampering the use of term rewriting altogether.
This paper introduces Glenside, a pure IR whose core abstraction—the access pattern—enables low-level, layout-aware, hardware-centric program rewrites. We demonstrate how term rewriting in Glenside can be used to map program fragments to hardware accelerator invocations and automatically discover classic data layout transformations like im2col. Glenside establishes a new foundation for exploring further term rewriting techniques in optimizing low-level tensor programs.

References

[1]
Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265–283. https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf
[2]
Luke Anderson, Andrew Adams, Karima Ma, Tzu-Mao Li, and Jonathan Ragan-Kelley. 2020. Learning to Schedule Halide Pipelines for the GPU. arxiv:2012.07145.
[3]
Franz Baader and Tobias Nipkow. 1998. Term Rewriting and All That. Cambridge University Press. https://doi.org/10.1017/CBO9781139172752
[4]
Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Feb, isbn:9781728114361 https://doi.org/10.1109/cgo.2019.8661197
[5]
Manuel M T Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell array codes with multicore GPUs. In DAMP ’11: The 6th workshop on Declarative Aspects of Multicore Programming. ACM.
[6]
Benjamin Charlier, Jean Feydy, Joan Alexis Glaunès, François-David Collin, and Ghislain Durif. 2021. Kernel Operations on the GPU, with Autodiff, without Memory Overflows. Journal of Machine Learning Research, 22, 74 (2021), 1–6. http://jmlr.org/papers/v22/20-275.html
[7]
Kumar Chellapilla, Sidd Puri, and Patrice Simard. 2006. High Performance Convolutional Neural Networks for Document Processing. In Tenth International Workshop on Frontiers in Handwriting Recognition, Guy Lorette (Ed.). Suvisoft, La Baule (France). https://hal.inria.fr/inria-00112631 http://www.suvisoft.com.
[8]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, and Luis Ceze. 2018. $TVM$: An automated end-to-end optimizing compiler for deep learning. In 13th $USENIX$ Symposium on Operating Systems Design and Implementation ($OSDI$ 18). 578–594.
[9]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA. 578–594. isbn:978-1-931971-47-8 https://www.usenix.org/conference/osdi18/presentation/chen
[10]
Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. Learning to Optimize Tensor Programs. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18). USENIX Association, USA. 3393–3404.
[11]
Zhi Chen and Cody Yu. 2020. How to Bring Your Own Codegen to TVM. https://tvm.apache.org/2020/07/15/how-to-bring-your-own-codegen-to-tvm
[12]
Chen, Yu-Hsin and Krishna, Tushar and Emer, Joel and Sze, Vivienne. 2016. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. In IEEE International Solid-State Circuits Conference, ISSCC 2016, Digest of Technical Papers. 262–263.
[13]
Hubert Garavel, Mohammad-Ali Tabikh, and Imad-Seddik Arrada. 2018. Benchmarking Implementations of Term Rewriting and Pattern Matching in Algebraic, Functional, and Object-Oriented Languages - The 4th Rewrite Engines Competition. In Proceedings of the 12th International Workshop on Rewriting Logic and its Applications (WRLA’18). Thessaloniki, Greece. https://hal.inria.fr/hal-01883212
[14]
Bastian Hagedorn, Archibald Samuel Elliott, Henrik Barthels, Rastislav Bodik, and Vinod Grover. 2020. Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs. arxiv:2003.06324.
[15]
Bastian Hagedorn, Johannes Lenfers, Thomas Kundefinedhler, Xueying Qin, Sergei Gorlatch, and Michel Steuwer. 2020. Achieving High-Performance the Functional Way: A Functional Pearl on Expressing High-Performance Optimizations as Rewrite Strategies. Proc. ACM Program. Lang., 4, ICFP (2020), Article 92, Aug., 29 pages. https://doi.org/10.1145/3408974
[16]
Yangqing Jia. 2014. Learning Semantic Image Representations at a Large Scale. Ph.D. Dissertation. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-93.html
[17]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. SIGARCH Comput. Archit. News, 45, 2 (2017), June, 1–12. issn:0163-5964 https://doi.org/10.1145/3140659.3080246
[18]
Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 77, Oct., 29 pages. issn:2475-1421 https://doi.org/10.1145/3133901
[19]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems.
[20]
Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2020. MLIR: A Compiler Infrastructure for the End of Moore’s Law. arxiv:2002.11054.
[21]
Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S. Vetter. 2018. NVIDIA Tensor Core Programmability, Performance & Precision. 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May, isbn:9781538655559 https://doi.org/10.1109/ipdpsw.2018.00091
[22]
T. Moreau, T. Chen, L. Vega, J. Roesch, L. Zheng, E. Yan, J. Fromm, Z. Jiang, L. Ceze, C. Guestrin, and A. Krishnamurthy. 2019. A Hardware-Software Blueprint for Flexible Deep Learning Specialization. IEEE Micro, 1–1. issn:0272-1732 https://doi.org/10.1109/MM.2019.2928962
[23]
Chandrakana Nandi, Max Willsey, Adam Anderson, James R. Wilcox, Eva Darulova, Dan Grossman, and Zachary Tatlock. 2020. Synthesizing Structured CAD Models with Equality Saturation and Inverse Transformations. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 31–44. isbn:9781450376136 https://doi.org/10.1145/3385412.3386012
[24]
Julie L. Newcomb, Andrew Adams, Steven Johnson, Rastislav Bodik, and Shoaib Kamil. 2020. Verifying and Improving Halide’s Term Rewriting System with Program Synthesis. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 166, Nov., 28 pages. https://doi.org/10.1145/3428234
[25]
Nvidia. 2018. The NVIDIA Deep Learning Accelerator (NVDLA). http://nvdla.org/
[26]
NVIDIA. 2020. Convolutional Layers User Guide. https://docs.nvidia.com/deeplearning/performance/dl-performance-convolutional/index.html
[27]
Pavel Panchekha, Alex Sanchez-Stern, James R. Wilcox, and Zachary Tatlock. 2015. Automatically Improving Accuracy for Floating Point Expressions. SIGPLAN Not., 50, 6 (2015), June, 1–11. issn:0362-1340 https://doi.org/10.1145/2813885.2737959
[28]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arxiv:1912.01703. arxiv:1912.01703
[29]
Shize Qin, Lena Klaaßen, Ulrich Gallersdörfer, Christian Stoll, and Da Zhang. 2020. Bitcoin’s future carbon footprint. arxiv:2011.02612.
[30]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’13). ACM, New York, NY, USA. 519–530. isbn:978-1-4503-2014-6 https://doi.org/10.1145/2491956.2462176
[31]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. Acm Sigplan Notices, 48, 6 (2013), 519–530.
[32]
Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner. 2019. Survey and Benchmarking of Machine Learning Accelerators. 2019 IEEE High Performance Extreme Computing Conference (HPEC), Sep, isbn:9781728150208 https://doi.org/10.1109/hpec.2019.8916327
[33]
Jared Roesch, Steven Lyubomirsky, Marisa Kirisame, Josh Pollock, Logan Weber, Ziheng Jiang, Tianqi Chen, Thierry Moreau, and Zachary Tatlock. 2019. Relay: A High-Level IR for Deep Learning. CoRR, abs/1904.08368 (2019), arxiv:1904.08368. arxiv:1904.08368
[34]
A. Simbürger, S. Apel, A. Größlinger, and C. Lengauer. 2013. The potential of polyhedral optimization: An empirical study. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). 508–518. https://doi.org/10.1109/ASE.2013.6693108
[35]
Michel Steuwer, Toomas Remmelg, and Christophe Dubach. 2017. Lift: A Functional Data-Parallel IR for High-Performance GPU Code Generation. In Proceedings of the 2017 International Symposium on Code Generation and Optimization (CGO ’17). IEEE Press, 74–85. isbn:9781509049318
[36]
Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. 2009. Equality Saturation: A New Approach to Optimization. In Proceedings of the 36th Annual ACM Symposium on Principles of Programming Languages (POPL ’09). 264–276. isbn:9781605583792 https://doi.org/10.1145/1480881.1480915
[37]
Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, and Gokcen Kestor. 2021. A High-Performance Sparse Tensor Algebra Compiler in Multi-Level IR. arxiv:2102.05187.
[38]
Alexa VanHattum, Rachit Nigam, Vincent T Lee, James Bornholt, and Adrian Sampson. 2021. Vectorization for Digital Signal Processors via Equality Saturation.
[39]
Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730.
[40]
Yisu Remy Wang, Shana Hutchison, Dan Suciu, Bill Howe, and Jonathan Leang. 2020. SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra. Proc. VLDB Endow., 13, 11 (2020), 1919–1932. http://www.vldb.org/pvldb/vol13/p1919-wang.pdf
[41]
Deborah L. Whitfield and Mary Lou Soffa. 1997. An Approach for Exploring Code Improving Transformations. ACM Trans. Program. Lang. Syst., 19, 6 (1997), Nov., 1053–1084. issn:0164-0925 https://doi.org/10.1145/267959.267960
[42]
Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. egg: fast and extensible equality saturation. Proceedings of the ACM on Programming Languages, 5, POPL (2021), 1–29.
[43]
Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA. 369–383. isbn:9781450371025 https://doi.org/10.1145/3373376.3378514
[44]
Yichen Yang, Phitchaya Mangpo Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy, and Jacques Pienaar. [n.d.]. Equality Saturation for Tensor Graph Superoptimization. arXiv preprint arXiv:2101.01332.
[45]
Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, and Ion Stoica. 2020. Ansor: Generating High-Performance Tensor Programs for Deep Learning. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, Banff, Canada. 863–879. isbn:978-1-939133-19-9 https://www.usenix.org/conference/osdi20/presentation/zheng
[46]
Shoshana Zuboff. 2018. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power (1st ed.). isbn:1610395697

Cited By

View all
  • (2024)Application-level Validation of Accelerator Designs Using a Formal Software/Hardware InterfaceACM Transactions on Design Automation of Electronic Systems10.1145/363905129:2(1-25)Online publication date: 14-Feb-2024
  • (2024)Guided Equality SaturationProceedings of the ACM on Programming Languages10.1145/36329008:POPL(1727-1758)Online publication date: 5-Jan-2024
  • (2023)Combining E-Graphs with Abstract InterpretationProceedings of the 12th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis10.1145/3589250.3596144(1-7)Online publication date: 6-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MAPS 2021: Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming
June 2021
52 pages
ISBN:9781450384674
DOI:10.1145/3460945
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. machine learning compilers
  2. term rewriting

Qualifiers

  • Research-article

Conference

PLDI '21
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)89
  • Downloads (Last 6 weeks)9
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Application-level Validation of Accelerator Designs Using a Formal Software/Hardware InterfaceACM Transactions on Design Automation of Electronic Systems10.1145/363905129:2(1-25)Online publication date: 14-Feb-2024
  • (2024)Guided Equality SaturationProceedings of the ACM on Programming Languages10.1145/36329008:POPL(1727-1758)Online publication date: 5-Jan-2024
  • (2023)Combining E-Graphs with Abstract InterpretationProceedings of the 12th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis10.1145/3589250.3596144(1-7)Online publication date: 6-Jun-2023
  • (2022)Verified tensor-program optimization via high-level scheduling rewritesProceedings of the ACM on Programming Languages10.1145/34987176:POPL(1-28)Online publication date: 12-Jan-2022
  • (2022)IMpress: Large Integer Multiplication Expression Rewriting for FPGA HLS2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM53951.2022.9786123(1-10)Online publication date: 15-May-2022
  • (2022)Automatic Datapath Optimization using E-Graphs2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)10.1109/ARITH54963.2022.00016(43-50)Online publication date: Sep-2022
  • (2022)Structured Operations: Modular Design of Code Generators for Tensor CompilersLanguages and Compilers for Parallel Computing10.1007/978-3-031-31445-2_10(141-156)Online publication date: 12-Oct-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media