skip to main content
10.1145/3624062.3624187acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

A Performance-Portable SYCL Implementation of CRK-HACC for Exascale

Published: 12 November 2023 Publication History
  • Get Citation Alerts
  • Abstract

    The first generation of exascale systems will include a variety of machine architectures, featuring GPUs from multiple vendors. As a result, many developers are interested in adopting portable programming models to avoid maintaining multiple versions of their code. It is necessary to document experiences with such programming models to assist developers in understanding the advantages and disadvantages of different approaches.
    To this end, this paper evaluates the performance portability of a SYCL implementation of a large-scale cosmology application (CRK-HACC) running on GPUs from three different vendors: AMD, Intel, and NVIDIA. We detail the process of migrating the original code from CUDA to SYCL and show that specializing kernels for specific targets can greatly improve performance portability without significantly impacting programmer productivity. The SYCL version of CRK-HACC achieves a performance portability of 0.96 with a code divergence of almost 0, demonstrating that SYCL is a viable programming model for performance-portable applications.

    References

    [1]
    Michal Babej and Pekka Jääskeläinen. 2020. HIPCL: Tool for Porting CUDA Applications to Advanced OpenCL Platforms Through HIP. In Proceedings of the International Workshop on OpenCL (Munich, Germany) (IWOCL ’20). Association for Computing Machinery, New York, NY, USA, Article 18, 3 pages.
    [2]
    Abhishek Bagusetty, Ajay Panyala, Gordon Brown, and Jack Kirk. 2022. Towards Cross-Platform Portability of Coupled-Cluster Methods with Perturbative Triples using SYCL. In 2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 81–88.
    [3]
    Reuben D. Budiardja, Mark Berrill, Markus Eisenbach, Gustav R. Jansen, Wayne Joubert, Stephen Nichols, David M. Rogers, Arnold Tharrington, and O. E. Bronson Messer. 2023. Ready for the Frontier: Preparing Applications for the World’s First Exascale System. In High Performance Computing, Abhinav Bhatele, Jeff Hammond, Marc Baboulin, and Carola Kruse (Eds.). Springer Nature Switzerland, Cham, 182–201.
    [4]
    Steffen Christgau and Thomas Steinke. 2020. Porting a Legacy CUDA Stencil Code to oneAPI. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 359–367.
    [5]
    Manuel Costanzo, Enzo Rucci, Carlos García-Sánchez, Marcelo Naiouf, and Manuel Prieto-Matías. 2022. Migrating CUDA to oneAPI: A Smith-Waterman Case Study. In Bioinformatics and Biomedical Engineering, Ignacio Rojas, Olga Valenzuela, Fernando Rojas, Luis Javier Herrera, and Francisco Ortuño (Eds.). Springer International Publishing, Cham, 103–116.
    [6]
    J. D. Emberson, Nicholas Frontiere, Salman Habib, Katrin Heitmann, Patricia Larsen, Hal Finkel, and Adrian Pope. 2019. The Borg Cube Simulation: Cosmological Hydrodynamics with CRK-SPH. The Astrophysical Journal 877, 2, Article 85 (June 2019), 85 pages. arxiv:1811.03593 [astro-ph.CO]
    [7]
    Nicholas Frontiere, J. D. Emberson, Michael Buehlmann, Joseph Adamo, Salman Habib, Katrin Heitmann, and Claude-André Faucher-Giguère. 2023. Simulating Hydrodynamics in Cosmology with CRK-HACC. The Astrophysical Journal Supplement Series 264, 2, Article 34 (Feb. 2023), 34 pages. arxiv:2202.02840 [astro-ph.CO]
    [8]
    Nicholas Frontiere, Katrin Heitmann, Esteban Rangel, Patricia Larsen, Adrian Pope, Imran Sultan, Thomas Uram, Salman Habib, Silvio Rizzi, Joe Insley, and HACC Collaboration. 2022. Farpoint: A High-resolution Cosmology Simulation at the Gigaparsec Scale. The Astrophysical Journal Supplement Series 259, 1, Article 15 (March 2022), 15 pages. arxiv:2109.01956 [astro-ph.CO]
    [9]
    Nicholas Frontiere, Cody D. Raskin, and J. Michael Owen. 2017. CRKSPH - A Conservative Reproducing Kernel Smoothed Particle Hydrodynamics Scheme. J. Comput. Phys. 332 (March 2017), 160–209. arxiv:1605.00725 [physics.comp-ph]
    [10]
    Khronos SYCL Working Group. 2023. SYCL 2020 Specification (revision 7).
    [11]
    Salman Habib, Vitali Morozov, Hal Finkel, Adrian Pope, Katrin Heitmann, Kalyan Kumaran, Tom Peterka, Joe Insley, David Daniel, Patricia Fasel, Nicholas Frontiere, and Zarija Lukic. 2012. The Universe at extreme scale: Multi-petaflop sky simulation on the BG/Q. In SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–11.
    [12]
    Salman Habib, Vitali Morozov, Nicholas Frontiere, Hal Finkel, Adrian Pope, and Katrin Heitmann. 2013. HACC: Extreme scaling and performance across diverse architectures. In SC ’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–10.
    [13]
    Salman Habib, Adrian Pope, Hal Finkel, Nicholas Frontiere, Katrin Heitmann, David Daniel, Patricia Fasel, Vitali Morozov, George Zagaris, Tom Peterka, Venkatram Vishwanath, Zarija Lukić, Saba Sehrish, and Wei-keng Liao. 2016. HACC: Simulating sky surveys on state-of-the-art supercomputing architectures. New Astronomy 42 (Jan. 2016), 49–65. arxiv:1410.2805 [astro-ph.IM]
    [14]
    Salman Habib, Adrian Pope, Zarija Lukić, David Daniel, Patricia Fasel, Nehal Desai, Katrin Heitmann, Chung-Hsing Hsu, Lee Ankeny, Graham Mark, Suman Bhattacharya, and James Ahrens. 2009. Hybrid petacomputing meets cosmology: The Roadrunner Universe project. Journal of Physics: Conference Series 180, 1 (jul 2009), 012019.
    [15]
    S. L. Harrell, J. Kitson, R. Bird, S. J. Pennycook, J. Sewall, D. Jacobsen, D. N. Asanza, A. Hsu, H. C. Carrillo, H. Kim, and R. Robey. 2018. Effective Performance Portability. In 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 24–36.
    [16]
    Katrin Heitmann, Hal Finkel, Adrian Pope, Vitali Morozov, Nicholas Frontiere, Salman Habib, Esteban Rangel, Thomas Uram, Danila Korytov, Hillary Child, Samuel Flender, Joe Insley, and Silvio Rizzi. 2019. The Outer Rim Simulation: A Path to Many-core Supercomputers. The Astrophysical Journal Supplement Series 245, 1, Article 16 (Nov. 2019), 16 pages. arxiv:1904.11970 [astro-ph.CO]
    [17]
    Katrin Heitmann, Nicholas Frontiere, Esteban Rangel, Patricia Larsen, Adrian Pope, Imran Sultan, Thomas Uram, Salman Habib, Hal Finkel, Danila Korytov, Eve Kovacs, Silvio Rizzi, Joe Insley, and Janet Y. K. Knowles. 2021. The Last Journey. I. An Extreme-scale Simulation on the Mira Supercomputer. The Astrophysical Journal Supplement Series 252, 2, Article 19 (Feb. 2021), 19 pages. arxiv:2006.01697 [astro-ph.CO]
    [18]
    Katrin Heitmann, Nicholas Frontiere, Chris Sewell, Salman Habib, Adrian Pope, Hal Finkel, Silvio Rizzi, Joe Insley, and Suman Bhattacharya. 2015. The Q Continuum Simulation: Harnessing the Power of GPU Accelerated Supercomputers. The Astrophysical Journal Supplement Series 219, 2, Article 34 (Aug. 2015), 34 pages. arxiv:1411.3396 [astro-ph.CO]
    [19]
    Brian Homerding and John Tramm. 2020. Evaluating the Performance of the HipSYCL Toolchain for HPC Kernels on NVIDIA V100 GPUs. In Proceedings of the International Workshop on OpenCL (Munich, Germany) (IWOCL ’20). Association for Computing Machinery, New York, NY, USA, Article 16, 7 pages.
    [20]
    Andy Huang. 2023. SYCLomatic Compatibility Library: Making Migration to SYCL Easier. In Proceedings of the 2023 International Workshop on OpenCL (Cambridge, United Kingdom) (IWOCL ’23). Association for Computing Machinery, New York, NY, USA, Article 5, 2 pages.
    [21]
    Beau Johnston, Jeffrey S. Vetter, and Josh Milthorpe. 2020. Evaluating the Performance and Portability of Contemporary SYCL Implementations. In 2020 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 45–56.
    [22]
    Bálint Joó, Thorsten Kurth, M. A. Clark, Jeongnim Kim, Christian Robert Trott, Dan Ibanez, Daniel Sunderland, and Jack Deslippe. 2019. Performance Portability of a Wilson Dslash Stencil Operator Mini-App Using Kokkos and SYCL. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 14–25.
    [23]
    JaeHyuk Kwack, John Tramm, Colleen Bertoni, Yasaman Ghadar, Brian Homerding, Esteban Rangel, Christopher Knight, and Scott Parker. 2021. Evaluation of Performance Portability of Applications and Mini-Apps across AMD, Intel and NVIDIA GPUs. In 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 45–56.
    [24]
    D. Lebrun-Grandié, A. Prokopenko, B. Turcksin, and S. R. Slattery. 2020. ArborX: A Performance Portable Geometric Search Library. ACM Trans. Math. Softw. 47, 1, Article 2 (Dec. 2020), 15 pages.
    [25]
    S. John Pennycook and Jason Sewall. 2021. Revisiting a Metric for Performance Portability. In 2021 IEEE/ACM International Workshop on Performance, Portability, and Prodctivity in HPC (P3HPC) (St. Louis, MO).
    [26]
    S. John Pennycook, Jason Sewall, Douglas Jacobsen, Tom Deakin, Yuliana Zamora, and Kin Long Kelvin Lee. 2023. Performance, Portability and Productivity Analysis Library. https://doi.org/10.5281/zenodo.7733678
    [27]
    S. John Pennycook, Jason D. Sewall, Douglas W. Jacobsen, Tom Deakin, and Simon McIntosh-Smith. 2021. Navigating Performance, Portability, and Productivity. Computing in Science & Engineering 23, 5 (2021), 28–38.
    [28]
    S. J. Pennycook, J. D. Sewall, and V. W. Lee. 2016. A Metric for Performance Portability. CoRR abs/1611.07409 (2016). arxiv:1611.07409http://arxiv.org/abs/1611.07409
    [29]
    S. J. Pennycook, J. D. Sewall, and V. W. Lee. 2017. Implications of a Metric for Performance Portability. Future Generation Computer Systems (aug 2017).
    [30]
    Adrian Pope, Salman Habib, Zarija Lukic, David Daniel, Patricia Fasel, Nehal Desai, and Katrin Heitmann. 2010. The Accelerated Universe. Computing in Science & Engineering 12, 4 (2010), 17–25.
    [31]
    J. Sewall, J. Pennycook, and D. Jacobsen. 2023. Code Base Investigator. https://doi.org/10.5281/zenodo.5018973
    [32]
    Jason Sewall, S. John Pennycook, Douglas Jacobsen, Tom Deakin, and Simon McIntosh-Smith. 2020. Interpreting and Visualizing Performance Portability Metrics. In 2020 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 14–24.
    [33]
    Leonardo Solis-Vasquez, Edward Mascarenhas, and Andreas Koch. 2023. Experiences Migrating CUDA to SYCL: A Molecular Docking Case Study. In Proceedings of the 2023 International Workshop on OpenCL (Cambridge, United Kingdom) (IWOCL ’23). Association for Computing Machinery, New York, NY, USA, Article 15, 11 pages.
    [34]
    Muhammad Tanvir, Kumudha Narasimhan, Mehdi Goli, Ouadie El Farouki, Svetlozar Georgiev, and Isaac Ault. 2022. Towards Performance Portability of AI Models Using SYCL-DNN. In International Workshop on OpenCL (Bristol, United Kingdom, United Kingdom) (IWOCL’22). Association for Computing Machinery, New York, NY, USA, Article 23, 3 pages.
    [35]
    Christian R. Trott, Damien Lebrun-Grandié, Daniel Arndt, Jan Ciesko, Vinh Dang, Nathan Ellingwood, Rahulkumar Gayatri, Evan Harvey, Daisy S. Hollman, Dan Ibanez, Nevin Liber, Jonathan Madsen, Jeff Miles, David Poliakoff, Amy Powell, Sivasankaran Rajamanickam, Mikael Simberg, Dan Sunderland, Bruno Turcksin, and Jeremiah Wilke. 2022. Kokkos 3: Programming Model Extensions for the Exascale Era. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2022), 805–817.
    [36]
    Zhiming Wang, Yury Plyakhin, Chenwei Sun, Ziran Zhang, Zhiwei Jiang, Andy Huang, and Hao Wang. 2022. A Source-to-Source CUDA to SYCL Code Migration Tool: Intel® DPC++ Compatibility Tool. In International Workshop on OpenCL (Bristol, United Kingdom, United Kingdom) (IWOCL’22). Association for Computing Machinery, New York, NY, USA, Article 17, 2 pages.
    [37]
    Jisheng Zhao, Colleen Bertoni, Jeffrey Young, Kevin Harms, Vivek Sarkar, and Brice Videau. 2023. HIPLZ: Enabling Performance Portability for Exascale Systems. In Euro-Par 2022: Parallel Processing Workshops, Jeremy Singer, Yehia Elkhatib, Dora Blanco Heras, Patrick Diehl, Nick Brown, and Aleksandar Ilic (Eds.). Springer Nature Switzerland, Cham, 197–210.

    Cited By

    View all

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
    November 2023
    2180 pages
    ISBN:9798400707858
    DOI:10.1145/3624062
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 November 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. SYCL
    2. cosmology
    3. performance portability
    4. productivity

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SC-W 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 67
      Total Downloads
    • Downloads (Last 12 months)67
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media