skip to main content
10.1145/3605731.3605885acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Public Access

Codelet Pipe: Realization of Dataflow Software Pipelining for Extended Codelet Model

Published: 07 September 2023 Publication History

Abstract

Dataflow Software Pipelining for Codelet Model is a coarse-grained code-mapping scheme designed to exploit pipelined parallelism across Codelets executing on different cores. The extended operational semantics of the Codelet model exploit pipelined parallelism across loops (coarse-grained) using single owner FIFO buffers across Codelet’s dependencies. The extended Codelet Model with Dataflow Software Pipelining extensions has shown promising performance benefits by leveraging FIFO buffers to communicate between producer and consumer codelets. These performance gains can be further amplified using an efficient implementation of FIFO buffers using hardware-software co-design principles for an architecture that supports explicit access to scratchpad memory closer to compute cores.
In this work, we introduce Codelet Pipe which serves as an efficient hardware-software co-designed communication channel between producer-consumer codelets to take advantage of dataflow software pipelining for codelet model. The current implementation of Codelet Pipe exploits Shared Local Memory architectural feature of Intel Iris Pro GPU using OpenCL. Codelet Pipe enables users to construct well-structured Codelet Graphs as well as helps with the challenge of ease of Programmability by relieving user from the responsibility of handling communication between producer-consumer codelet pairs. We demonstrate performance gains using a set of micro-benchmarks for a GPU architecture of strategic importance for exascale supercomputers.

References

[1]
Arvind and R. S. Nikhil. 1990. Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. Comput. 39, 3 (March 1990), 300–318. https://doi.org/10.1109/12.48862
[2]
R.H. Dennard, F.H. Gaensslen, Hwa-Nien Yu, V.L. Rideout, E. Bassous, and A.R. LeBlanc. 1974. Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE Journal of Solid-State Circuits 9, 5 (1974), 256–268. https://doi.org/10.1109/JSSC.1974.1050511
[3]
Peter J. Denning and Jack B. Dennis. 2010. The Resurgence of Parallelism. Commun. ACM 53, 6 (June 2010), 30–32. https://doi.org/10.1145/1743546.1743560
[4]
Jack B. Dennis. 2017. Principles to Support Modular Software Construction. J. Comput. Sci. Technol. 32, 1 (2017), 3–10. https://doi.org/10.1007/s11390-017-1702-6
[5]
J. B. Dennis, G. R. Gao, and V. Sarkar. 2012. Determinacy and Repeatability of Parallel Program Schemata. In 2012 Data-Flow Execution Models for Extreme Scale Computing. 1–9. https://doi.org/10.1109/DFM.2012.10
[6]
Argonne Leadership Computing Facility. 2023. ALCF AI Testbed. https://www.alcf.anl.gov/alcf-ai-testbed.
[7]
Guang Gao, Joshua Suetterlein, and Stéphane Zuckerman. 2011. CAPSL Technical Memo 104: Toward an Execution Model for Extreme-Scale Systems - Runnemede and Beyond. Technical Report 104. University of Delaware.
[8]
Guang R. Gao. 1989. Algorithmic Aspects of Balancing Techniques for Pipelined Data Flow Code Generation. In J. Parallel Distrib. Comput, Vol. 6. Academic Press, Inc., Orlando, FL, USA, 39–61. https://doi.org/10.1016/0743-7315(89)90041-5
[9]
Guang R. Gao. 1990. A Code Mapping Scheme for Dataflow Software Pipelining. Kluwer Academic Publishers, Norwell, MA, USA.
[10]
G. R. Gao and R. Tio. 1989. Instruction set architecture of an efficient pipelined dataflow architecture. In [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track, Vol. 1. 385–392 vol.1. https://doi.org/10.1109/HICSS.1989.47180
[11]
Al Geist and Robert Lucas. 2009. Major Computer Science Challenges At Exascale. The International Journal of High Performance Computing Applications 23, 4 (2009), 427–436. https://doi.org/10.1177/1094342009347445 arXiv:https://doi.org/10.1177/1094342009347445
[12]
R. Govindarajan, Guang R. Gao, and Palash Desai. 2002. Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks. In Journal of VLSI signal processing systems for signal, image and video technology, Vol. 31. 207–229. https://doi.org/10.1023/A:1015452903532
[13]
Khronos Group. 2015. OpenCL Specification version 2.0 (API). https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf.
[14]
Khronos Group. 2019. OpenCL C++ Language Specification. https://www.khronos.org/registry/OpenCL/specs/2.2/pdf/OpenCL_Cxx.pdf.
[15]
Khronos Group. 2019. The SYCL 1.2.1 Specification. https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf.
[16]
Khronos Group. 2021. Khronos Group. https://www.khronos.org/.
[17]
H. Hum, X. Tang, Y. Zhu, G. Gao, X. Xue, H. Cai, and P. Ouellet. Oct 1996. Compiling C for the EARTH multithreaded architecture. In Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques. 12–23. https://doi.org/10.1109/PACT.1996.552551
[18]
Intel Inc. 2020. The Compute Architecture of The compute architecture of Intel Processor Graphics Gen 9. https://software.intel.com/sites/default/files/managed/c5/9a/The-Compute-Architecture-of-Intel-Processor-Graphics-Gen9-v1d0.pdf.
[19]
Intel Inc. 2020. Intel Unveils New GPU Architecture with High-Performance Computing and AI Acceleration, and oneAPI Software Stack with Unified and Scalable Abstraction for Heterogeneous Architectures. https://newsroom.intel.com/news-releases/intel-unveils-new-gpu-architecture-optimized-for-hpc-ai-oneapi/#gs.1y2s04.
[20]
Intel Inc. 2021. Data Parallel C++ Language. https://software.intel.com/content/www/us/en/develop/tools/oneapi/data-parallel-c-plus-plus.html#gs.1dvgr2.
[21]
Intel Inc. 2021. Intel OpenCL Built in Intrinsics. https://github.com/intel/pti-gpu/blob/ea615893938f9efd1e736cf8dbaf0bb1f25930ed/chapters/binary_instrumentation/OpenCLBuiltIn.md.
[22]
Argonne National Lab. 2021. Aurora Supercomputer. https://www.alcf.anl.gov/aurora.
[23]
A. Munshi. 2009. The OpenCL specification., 314 pages. https://doi.org/10.1109/HOTCHIPS.2009.7478342
[24]
Top500 Org. 2020. Top 500 Supercomputer List, November 2022. https://www.top500.org/lists/top500/2022/11/.
[25]
Nicolas Poggi, Sherif Sakr, and Albert Y. Zomaya. 2019. Microbenchmark. Springer International Publishing, Cham, 1143–1152. https://doi.org/10.1007/978-3-319-77525-8_111
[26]
Siddhisanket Raskar. 2023. clCodeletPipe Library. https://github.com/sraskar/clCodeletPipe.
[27]
Siddhisanket Raskar, Thomas Applencourt, Kalyan Kumaran, and Guang Gao. 2019. Position Paper: Extending Codelet Model for Dataflow Software Pipelining using Software-Hardware Co-Design. In 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Vol. 2. 640–645. https://doi.org/10.1109/COMPSAC.2019.10280
[28]
Siddhisanket Raskar, Thomas Applencourt, Kalyan Kumaran, and Guang Gao. 2019. Position Paper: Extending Codelet Model for Dataflow Software Pipelining using Software-Hardware Co-Design. In 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Vol. 2. 640–645. https://doi.org/10.1109/COMPSAC.2019.10280
[29]
Siddhisanket Raskar, Jose M Monsalve Diaz, Thomas Applencourt, Kalyan Kumaran, and Guang Gao. 2023. Implementation of Dataflow Software Pipelining for Codelet Model. In Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering (Coimbra, Portugal) (ICPE ’23). Association for Computing Machinery, New York, NY, USA, 161–172. https://doi.org/10.1145/3578244.3583734
[30]
R.R. Schaller. 1997. Moore’s law: past, present and future. IEEE Spectrum 34, 6 (1997), 52–59. https://doi.org/10.1109/6.591665
[31]
Kevin Bryan Theobald. 1999. EARTH: An Efficient Architecture for Running Threads. Ph. D. Dissertation. McGill, Montreal.
[32]
Stéphane Zuckerman, Joshua Suetterlein, Rob Knauerhase, and Guang R. Gao. 2011. Using a "Codelet" Program Execution Model for Exascale Machines: Position Paper. In Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era (San Jose, California, USA) (EXADAPT ’11). ACM, New York, NY, USA, 64–69. https://doi.org/10.1145/2000417.2000424

Index Terms

  1. Codelet Pipe: Realization of Dataflow Software Pipelining for Extended Codelet Model

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP Workshops '23: Proceedings of the 52nd International Conference on Parallel Processing Workshops
    August 2023
    217 pages
    ISBN:9798400708428
    DOI:10.1145/3605731
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 September 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Codelet Pipe
    2. Dataflow Model
    3. Dataflow Software Pipelining
    4. Extended Codelet Model
    5. Hardware-Software Co-design
    6. Many-core Architecture
    7. Programmability
    8. exa-scale

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ICPP-W 2023

    Acceptance Rates

    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 51
      Total Downloads
    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)17
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media