skip to main content
research-article

General-Purpose Computing with Soft GPUs on FPGAs

Published: 24 January 2018 Publication History

Abstract

Using field-programmable gate arrays (FPGAs) as a substrate to deploy soft graphics processing units (GPUs) would enable offering the FPGA compute power in a very flexible GPU-like tool flow. Application-specific adaptations like selective hardening of floating-point operations and instruction set subsetting would mitigate the high area and power demands of soft GPUs. This work explores the capabilities and limitations of soft General Purpose Computing on GPUs (GPGPU) for both fixed- and floating point arithmetic. For this purpose, we have developed FGPU: a configurable, scalable, and portable GPU architecture designed especially for FPGAs. FGPU is open-source and implemented entirely in RTL. It can be programmed in OpenCL and controlled through a Python API. This article introduces its hardware architecture as well as its tool flow. We evaluated the proposed GPGPU approach against multiple other solutions. In comparison to homogeneous Multi-Processor System-On-Chips (MPSoCs), we found that using a soft GPU is a Pareto-optimal solution regarding throughput per area and energy consumption. On average, FGPU has a 2.9× better compute density and 11.2× less energy consumption than a single MicroBlaze processor when computing in IEEE-754 floating-point format. An average speedup of about 4× over the ARM Cortex-A9 supported with the NEON vector co-processor has been measured for fixed- or floating-point benchmarks. In addition, the biggest FGPU cores we could implement on a Xilinx Zynq-7000 System-On-Chip (SoC) can deliver similar performance to equivalent implementations with High-Level Synthesis (HLS).

References

[1]
A. Al-Dujaili et al. 2012. Guppy: A GPU-like soft-core processor. In Proceedings of the International Conference on Field-Programmable Technology (FPT’12). 57--60.
[2]
Muhammed Al Kadi, Benedikt Janssen, and Michael Huebner. 2016. FGPU: An SIMT-architecture for FPGAs (FPGA’16). ACM, New York, NY, 254--263.
[3]
Muhammed Al Kadi, Benedikt Janssen, and Michael Huebner. 2017. Floating-point arithmetic using GPGPU on FPGAs. In Proceedings of the 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI’17).
[4]
Altera Corp. Dec. 2015. Stratix 10 Device Overview. Initial Release.
[5]
AMD, Inc. 2017. ADM Accelerated Parallel Processing SDK v3.0. Retrieved from http://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/.
[6]
K. Andryc, M. Merchant, and R. Tessier. 2013. FlexGrip: A soft GPGPU for FPGAs. In Proceedings of the 2013 International Conference on Field-Programmable Technology (FPT’13). 230--237.
[7]
K. Andryc, T. Thomas, and R. Tessier. 2016. Soft GPGPUs for embedded FPGAs: An architectural evaluation. In Proceedings of the 2016 Second Workshop on Overlay Architectures for FPGAs (OLAF’16).
[8]
Raghuraman Balasubramanian et al. 2015. Enabling GPGPU low-level hardware explorations with MIAOW: An open-source RTL implementation of a GPGPU. ACM Trans. Archit. Code Optim. 12, 2, Article 21 (June 2015).
[9]
J. Bush, P. Dexter, and T. N. Miller. 2015. Nyami: A synthesizable GPU architectural model for general-purpose and graphics-specific workloads. In Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’15). 173--182.
[10]
D. W. Chang et al. 2010. ERCBench: An open-source benchmark suite for embedded and reconfigurable computing. In Proceedings of the 2010 International Conference on Field Programmable Logic and Applications. 408--413. 1946-147X
[11]
Diego Valverde. 2011. Theia: Ray Graphic Processing Unit. Retrieved from opencores.com/project,theia_gpu.
[12]
M. Al Kadi and M. Huebner. 2016. Integer computations with soft GPGPU on FPGAs. In Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT’16). 28--35.
[13]
Nachiket Kapre. 2016. Optimizing soft vector processing in FPGA-based embedded systems. ACM Trans. Reconfigurable Technol. Syst. 9, 3, Article 17 (May 2016).
[14]
Khronos Group. 2012. OpenCL 1.2 Specification. https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf.
[15]
J. Kingyens and J. Gregory Steffan. 2010. A GPU-inspired soft processor for high-throughput acceleration. In Proceedings of the 2010 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW’10). 1--8.
[16]
C. Lattner and V. Adve. 2004. LLVM: A compilation framework for lifelong program analysis transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’04). 75--86.
[17]
T. Miller. 2016. OpenShader: Open Architecture GPU Simulator and Implementation. Retrieved from sourceforge.net/projects/openshader.
[18]
Muhammed Al Kadi. 2017. FGPU Demo using PYNQ on the Xilinx ZC706. Retrieved from https://github.com/malkadi/FGPU_IPython.
[19]
Muhammed Al Kadi. 2017. The FGPU Project. Retrieved from https://github.com/malkadi/FGPU.
[20]
R. Rashid, J. G. Steffan, and V. Betz. 2014. Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS. In Proceedings of the IEEE International Conference on Field-Programmable Technology (FPT’14). 20--27.
[21]
A. Severance and G. G. F. Lemieux. 2013. Embedded Supercomputing in FPGAs with the vectorblox MXP matrix processor. In Proceedings of the 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’13). 1--10.
[22]
VectorBlox Computing, Inc. 2017. The MXP Vector Matrix Processor Repository. Retrieved from https://github.com/VectorBlox/mxp.
[23]
Xilinx, Inc. 2015. AXI DMA, LogiCORE IP Product Guide (PG021, v7.1). https://www.xilinx.com/support/documentation/ipdocumentation/axidma/v71/pg021axidma.pdf.
[24]
Xilinx, Inc. 2015. Floating-Point Operator v7.1, LogiCORE IP Product Guide (PG060). https://www.xilinx.com/support/documentation/ipdocumentation/floatingpoint/v71/pg060-floating-point.pdf.
[25]
Xilinx, Inc. 2016. 7 Series FPGAs Configurable Logic Block v1.8, (UG474). https://www.xilinx.com/support/documentation/userguides/ug4747SeriesCLB.pdf.
[26]
Xilinx, Inc. 2016. The PYNQ Project. http://www.pynq.io {Online; accessed 15-Jan-2017}.
[27]
Xilinx, Inc. 2016. UltraScale Architecture and Product Overview (v3.1), DS890. https://www.xilinx.com/support/documentation/datasheets/ds890-ultrascale-overview.pdf.
[28]
Xilinx, Inc. 2016. Zynq-7000 All Programmable SoC, Technical Reference Manual (UG585, v1.12.1). https://www.xilinx.com/support/documentation/userguides/ug585-Zynq-7000-TRM.pdf.
[29]
Xilinx, Inc. 2016. SDAccel Development Environment Methodology Guide, Performance Optimization (UG1207, v2.0). https://www.xilinx.com/support/documentation/swmanuals/ug1207-sdaccel-performance-optimization.pdf. (August 2016). Ch. 7.
[30]
Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose. 2009. Fine-grain performance scaling of soft vector processors. In Proceedings of the 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’09). ACM, New York, NY, 97--106.

Cited By

View all
  • (2024)Comparative Analysis of Executing GPU Applications on FPGA: HLS vs. Soft GPU Approaches2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00123(634-641)Online publication date: 27-May-2024
  • (2022)Artificial Intelligence for Mass Spectrometry and Nuclear Magnetic Resonance Spectroscopy Using a Novel Data Augmentation MethodIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2021.313137110:1(87-98)Online publication date: 1-Jan-2022
  • (2022)Gaming for Better Psychological Health: A Solution Based on the FPGA Zynq 70002022 8th International Conference on Signal Processing and Communication (ICSC)10.1109/ICSC56524.2022.10009066(602-607)Online publication date: 1-Dec-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 11, Issue 1
Special Section on FCCM 2016 and Regular Papers
March 2018
183 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3178391
  • Editor:
  • Steve Wilton
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 January 2018
Accepted: 01 December 2017
Revised: 01 November 2017
Received: 01 June 2017
Published in TRETS Volume 11, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPGPU
  2. OpenCL
  3. PYNQ
  4. soft GPUs

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)3
Reflects downloads up to 24 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Comparative Analysis of Executing GPU Applications on FPGA: HLS vs. Soft GPU Approaches2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00123(634-641)Online publication date: 27-May-2024
  • (2022)Artificial Intelligence for Mass Spectrometry and Nuclear Magnetic Resonance Spectroscopy Using a Novel Data Augmentation MethodIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2021.313137110:1(87-98)Online publication date: 1-Jan-2022
  • (2022)Gaming for Better Psychological Health: A Solution Based on the FPGA Zynq 70002022 8th International Conference on Signal Processing and Communication (ICSC)10.1109/ICSC56524.2022.10009066(602-607)Online publication date: 1-Dec-2022
  • (2022)ICU4SAT: A General-Purpose Reconfigurable Instrument Control Unit Based on Open Source Components2022 IEEE Aerospace Conference (AERO)10.1109/AERO53065.2022.9843414(1-9)Online publication date: 5-Mar-2022
  • (2022)Investigating the reliability impacts of neutron-induced soft errors in aerial image classification CNNs implemented in a softcore SRAM-based FPGA GPUMicroelectronics Reliability10.1016/j.microrel.2022.114738138(114738)Online publication date: Nov-2022
  • (2022)Evaluating low-level software-based hardening techniques for configurable GPU architecturesThe Journal of Supercomputing10.1007/s11227-021-04154-z78:6(8081-8105)Online publication date: 1-Apr-2022
  • (2021)A Manycore Vision Processor for Real-Time Smart CamerasSensors10.3390/s2121713721:21(7137)Online publication date: 27-Oct-2021
  • (2021)Specializing FGPU for Persistent Deep LearningACM Transactions on Reconfigurable Technology and Systems10.1145/345788614:2(1-23)Online publication date: 15-Jul-2021
  • (2021)Neutron-induced Faults on CNN for Aerial Image Classification on SRAM-based FPGA Using Softcore GPU and HLS2021 21th European Conference on Radiation and Its Effects on Components and Systems (RADECS)10.1109/RADECS53308.2021.9954517(1-4)Online publication date: Sep-2021
  • (2021)AITIA: Embedded AI Techniques for Industrial Applications2021 31st International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL53798.2021.00071(374-375)Online publication date: Aug-2021
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media