Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2024
Method Bundles
SLE '24: Proceedings of the 17th ACM SIGPLAN International Conference on Software Language EngineeringPages 190–195https://doi.org/10.1145/3687997.3695633Performance-critical systems commonly optimize memory use and locality by selecting among multiple variants of a single logical operation. Algorithm developers then typically rely on ad-hoc API patterns or naming conventions to distinguish the variants. ...
- posterApril 2024
Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming Approach
FPGA '24: Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate ArraysApril 2024, Page 184https://doi.org/10.1145/3626202.3637593High-Level Synthesis enables the rapid prototyping of hardware accelerators, by combining a high-level description of the functional behavior of a kernel with a set of micro-architecture optimizations as inputs. Such pragmas may describe the pipelining ...
Predicting Performance and Accuracy of Mixed-Precision Programs for Precision Tuning
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software EngineeringArticle No.: 15, Pages 1–13https://doi.org/10.1145/3597503.3623338A mixed-precision program is a floating-point program that utilizes different precisions for different operations, providing the opportunity of balancing the trade-off between accuracy and performance. Precision tuning aims to find a mixed-precision ...
Hardware-Aware Static Optimization of Hyperdimensional Computations
Proceedings of the ACM on Programming Languages (PACMPL), Volume 7, Issue OOPSLA2Article No.: 222, Pages 1–30https://doi.org/10.1145/3622797Binary spatter code (BSC)-based hyperdimensional computing (HDC) is a highly error-resilient approximate computational paradigm suited for error-prone, emerging hardware platforms. In BSC HDC, the basic datatype is a hypervector, a typically large ...
Exocompilation for productive programming of hardware accelerators
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and ImplementationPages 703–718https://doi.org/10.1145/3519939.3523446High-performance kernel libraries are critical to exploiting accelerators and specialized instructions in many applications. Because compilers are difficult to extend to support diverse and rapidly-evolving hardware targets, and automatic optimization ...
-
- research-articleApril 2022
Complexity-guided container replacement synthesis
Proceedings of the ACM on Programming Languages (PACMPL), Volume 6, Issue OOPSLA1Article No.: 68, Pages 1–31https://doi.org/10.1145/3527312Containers, such as lists and maps, are fundamental data structures in modern programming languages. However, improper choice of container types may lead to significant performance issues. This paper presents Cres, an approach that automatically ...
- research-articleAugust 2021
From ASTs to Machine Code with LLVM
Programming '21: Companion Proceedings of the 5th International Conference on the Art, Science, and Engineering of ProgrammingPages 68–76https://doi.org/10.1145/3464432.3464777A compiler is a program that translates source code written in a particular language into another language. Internally, the whole process is typically split into multiple stages that handle one particular aspect of this translation. One of these ...
- research-articleNovember 2020
Eliminating abstraction overhead of Java stream pipelines using ahead-of-time program optimization
Proceedings of the ACM on Programming Languages (PACMPL), Volume 4, Issue OOPSLAArticle No.: 168, Pages 1–29https://doi.org/10.1145/3428236Java 8 introduced streams that allow developers to work with collections of data using functional-style operations. Streams are often used in pipelines of operations for processing the data elements, which leads to concise and elegant program code. ...
- extended-abstractAugust 2019
Optimizing Data Plane Programs for the Network
NetPL'19: Proceedings of the ACM SIGCOMM 2019 Workshop on Networking and Programming LanguagesPage 1https://doi.org/10.1145/3341561.3349590With the move of Software-defined networking from fixed to programmable data planes, network functions are written with P4 or eBPF for targets such as programmable switches, CPU based flow processors [5] and commodity CPUs [7]. These data plane programs ...
- research-articleMay 2019
Global optimization of operand transfer fusion in heterogeneous computing
SCOPES '19: Proceedings of the 22nd International Workshop on Software and Compilers for Embedded SystemsPages 49–58https://doi.org/10.1145/3323439.3323981We consider the problem of minimizing, for a dataflow graph of kernel calls, the overall number of operand data transfers, and thus, the accumulated transfer startup overhead, in heterogeneous systems with non-shared memory. Our approach analyzes the ...
- research-articleSeptember 2018
Theorem proving for all: equational reasoning in liquid Haskell (functional pearl)
Haskell 2018: Proceedings of the 11th ACM SIGPLAN International Symposium on HaskellPages 132–144https://doi.org/10.1145/3242744.3242756Equational reasoning is one of the key features of pure functional languages such as Haskell. To date, however, such reasoning always took place externally to Haskell, either manually on paper, or mechanised in a theorem prover. This article shows how ...
Also Published in:
ACM SIGPLAN Notices: Volume 53 Issue 7 - short-paperSeptember 2018
Towards automatic restrictification of CUDA kernel arguments
ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software EngineeringPages 928–931https://doi.org/10.1145/3238147.3241533Many procedural languages, such as C and C++, have pointers. Pointers are powerful and convenient, but pointer aliasing still hinders compiler optimizations, despite several years of research on pointer aliasing analysis. Because alias analysis is a ...
- research-articleNovember 2017
Egeria: a framework for automatic synthesis of HPC advising tools through multi-layered natural language processing
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 10, Pages 1–14https://doi.org/10.1145/3126908.3126961Achieving high performance on modern systems is challenging. Even with a detailed profile from a performance tool, writing or refactoring a program to remove its performance issues is still a daunting task for application programmers: it demands lots of ...
- research-articleOctober 2017
Making collection operations optimal with aggressive JIT compilation
SCALA 2017: Proceedings of the 8th ACM SIGPLAN International Symposium on ScalaPages 29–40https://doi.org/10.1145/3136000.3136002Functional collection combinators are a neat and widely accepted data processing abstraction. However, their generic nature results in high abstraction overheads -- Scala collections are known to be notoriously slow for typical tasks. We show that ...
- research-articleOctober 2017
GLORE: generalized loop redundancy elimination upon LER-notation
Proceedings of the ACM on Programming Languages (PACMPL), Volume 1, Issue OOPSLAArticle No.: 74, Pages 1–28https://doi.org/10.1145/3133898This paper presents GLORE, a novel approach to enabling the detection and removal of large-scoped redundant computations in nested loops. GLORE works on LER-notation, a new representation of computations in both regular and irregular loops. Together ...
- research-articleMay 2017
Helping programmers improve the energy efficiency of source code
ICSE-C '17: Proceedings of the 39th International Conference on Software Engineering CompanionPages 238–240https://doi.org/10.1109/ICSE-C.2017.80This paper briefly proposes a technique to detect energy inefficient fragments in the source code of a software system. Test cases are executed to obtain energy consumption measurements, and a statistical method, based on spectrum-based fault ...
- research-articleMay 2016
Floating-point precision tuning using blame analysis
- Cindy Rubio-González,
- Cuong Nguyen,
- Benjamin Mehne,
- Koushik Sen,
- James Demmel,
- William Kahan,
- Costin Iancu,
- Wim Lavrijsen,
- David H. Bailey,
- David Hough
ICSE '16: Proceedings of the 38th International Conference on Software EngineeringPages 1074–1085https://doi.org/10.1145/2884781.2884850While tremendously useful, automated techniques for tuning the precision of floating-point programs face important scalability challenges. We present Blame Analysis, a novel dynamic approach that speeds up precision tuning. Blame Analysis performs ...
- articleMay 2015
Modeling and optimizing MapReduce programs
Concurrency and Computation: Practice & Experience (CCOMP), Volume 27, Issue 7Pages 1734–1766https://doi.org/10.1002/cpe.3333MapReduce frameworks allow programmers to write distributed, data-parallel programs that operate on multisets. These frameworks offer considerable flexibility to support various kinds of programs and data. To understand the essence of the programming ...
- articleMay 2014
Method of Automated Generation of Autotuners for Parallel Programs
Cybernetics and Systems Analysis (KLU-CASA), Volume 50, Issue 3Pages 465–475https://doi.org/10.1007/s10559-014-9635-3This paper introduces a formal model of a method for automated adjustment of parallel applications (autotuning). The software implementation of this model is described in the form of a flexible software framework for automatic generation of autotuners ...
- posterFebruary 2014
Transformations for throughput optimization in high-level synthesis (abstract only)
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arraysPage 245https://doi.org/10.1145/2554688.2554772Programming productivity of FPGA devices remains a significant challenge, despite the emergence of robust high level synthesis tools to automatically transform codes written in high-level languages into RTL implementations. Focusing on a class of ...