skip to main content
10.1145/3520312.3534863acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

A graph neural network-based performance model for deep learning applications

Published: 13 June 2022 Publication History
  • Get Citation Alerts
  • Abstract

    The unprecedented proliferation of machine learning based software brings an ever-increasing need to optimize the implementation of such applications. State-of-the-art compilers for neural networks, such as Halide and TVM, incorporate a machine learning-based performance model to search the space of valid implementations of a given deep learning algorithm. For a given application, the model predicts the value of performance metrics such as the run time without executing the application on hardware. Such models speed up the compilation process by obviating the need to benchmark an enormous number of candidate implementations, referred to as schedules, on hardware. Existing performance models employ feed-forward networks, recurrent networks, or decision tree ensembles to estimate the performance of different implementations of a neural network. Graphs present a natural and intuitive way to model deep-learning networks where each node represents a computational stage or operation. Incorporating the inherent graph structure of these workloads in the performance model can enable a better representation and learning of inter-stage interactions. The accuracy of the performance model has direct implications on the efficiency of the search strategy, making it a crucial component of this class of deep-learning compilers. In this work, we develop a novel performance model that adopts a graph representation. In our model, each stage of computation represents a node characterized by features that capture the operations performed by the stage. The interaction between nodes is achieved using graph convolutions. Experimental evaluation shows a 7.75𝑥 and 12𝑥 reduction in prediction error compared to the existing Halide and TVM models, respectively.

    References

    [1]
    [n.d.]. Halide. https://github.com/halide/Halide
    [2]
    [n.d.]. Halide: A language for fast, portable computation on images and tensors. https://halide-lang.org/
    [3]
    [n.d.]. Halide Tutorial: Scheduling multi-stage pipelines. https://halide-lang.org/docs/tutorial_2lesson_08_scheduling_2_8cppexample.html
    [4]
    [n.d.]. Open Neural Network Exchange. https://onnx.ai/. Accessed: 2020-06-15.
    [5]
    [n.d.]. oneAPI Deep Neural Network Library (oneDNN). https:// github.com/oneapi-src/oneDNN. Accessed: 2021-08-19.
    [6]
    Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Geofrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265-283. https://www.usenix.org/system/files/conference/osdi16/ osdi16-abadi.pdf
    [7]
    Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, TzuMao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, and Jonathan Ragan-Kelley. 2019. Learning to Optimize Halide with Tree Search and Random Programs. ACM Trans. Graph. 38, 4, Article 121 ( July 2019 ), 12 pages. https://doi.org/10.1145/3306346.3322967
    [8]
    Amir H. Ashouri, William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano. 2018. A Survey on Compiler Autotuning Using Machine Learning. ACM Comput. Surv. 51, 5, Article 96 ( Sept. 2018 ), 42 pages. https://doi.org/10.1145/3197978
    [9]
    Riyadh Baghdadi, Massinissa Merouani, Mohamed-Hicham Leghettas, Kamel Abdous, Taha Arbaoui, Karima Benatchba, and Saman Amarasinghe. 2021. A Deep Learning Based Cost Model for Automatic Code Optimization. arXiv: 2104.04955 [cs.PL]
    [10]
    Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 578-594. https://www.usenix.org/conference/osdi18/presentation/chen
    [11]
    Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. Learning to Optimize Tensor Programs. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) ( NIPS'18). Curran Associates Inc., USA, 3393-3404. http://dl.acm.org/citation.cfm?id= 3327144. 3327258
    [12]
    Sharan Chetlur, Clif Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Eficient Primitives for Deep Learning. arXiv:1410.0759 [cs.NE]
    [13]
    Samuel J. Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, and Mike Burrows. 2021. A Learned Performance Model for Tensor Processing Units. arXiv: 2008. 01040 [cs.PF]
    [14]
    Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907 [cs.LG]
    [15]
    Charith Mendis, Alex Renda, Saman Amarasinghe, and Michael Carbin. 2019. Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. 4505-4515.
    [16]
    Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan RaganKelley, and Kayvon Fatahalian. 2016. Automatically Scheduling Halide Image Processing Pipelines. ACM Trans. Graph. 35, 4, Article 83 ( July 2016 ), 11 pages. https://doi.org/10.1145/2897824.2925952
    [17]
    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv: 1912. 01703 [cs.LG]
    [18]
    Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, Washington, USA) ( PLDI '13). Association for Computing Machinery, New York, NY, USA, 519-530. https://doi.org/10.1145/2491956.2462176
    [19]
    Savvas Sioutas, Sander Stuijk, Henk Corporaal, Twan Basten, and Lou Somers. 2018. Loop Transformations Leveraging Hardware Prefetching. In Proceedings of the 2018 International Symposium on Code Generation and Optimization (Vienna, Austria) ( CGO 2018 ). Association for Computing Machinery, New York, NY, USA, 254-264. https://doi.org/10.1145/3168823
    [20]
    Savvas Sioutas, Sander Stuijk, Luc Waeijen, Twan Basten, Henk Corporaal, and Lou Somers. 2019. Schedule Synthesis for Halide Pipelines through Reuse Analysis. ACM Trans. Archit. Code Optim. 16, 2, Article 10 ( April 2019 ), 22 pages. https://doi.org/10.1145/3310248
    [21]
    Benoit Steiner, Chris Cummins, Horace He, and Hugh Leather. 2021. Value Learning for Throughput Optimization of Deep Learning Workloads. In Proceedings of Machine Learning and Systems, Vol. 3. 323-334.
    [22]
    Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. 2018. An End-to-End Deep Learning Architecture for Graph Classification. In AAAI.
    [23]
    Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. 2018. Graph Neural Networks: A Review of Methods and Applications. CoRR abs/ 1812.08434 ( 2018 ). arXiv: 1812.08434 http://arxiv.org/abs/ 1812.08434

    Cited By

    View all

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MAPS 2022: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming
    June 2022
    79 pages
    ISBN:9781450392730
    DOI:10.1145/3520312
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Code Optimization
    2. Deep Learning Compilers
    3. Graph Neural Networks
    4. Performance Modeling

    Qualifiers

    • Research-article

    Conference

    MAPS '22
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 277
      Total Downloads
    • Downloads (Last 12 months)80
    • Downloads (Last 6 weeks)4

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media