Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
All you need is superword-level parallelism: systematic control-flow vectorization with SLP
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and ImplementationJune 2022, Pages 301–315https://doi.org/10.1145/3519939.3523701Superword-level parallelism (SLP) vectorization is a proven technique for vectorizing straight-line code. It works by replacing independent, isomorphic instructions with equivalent vector instructions. Larsen and Amarasinghe originally proposed using ...
Autoscheduling for sparse tensor algebra with an asymptotic cost model
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and ImplementationJune 2022, Pages 269–285https://doi.org/10.1145/3519939.3523442While loop reordering and fusion can make big impacts on the constant-factor performance of dense tensor programs, the effects on sparse tensor programs are asymptotic, often leading to orders of magnitude performance differences in practice. Sparse ...
- research-articleJune 2020
Automatic generation of efficient sparse tensor format conversion routines
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2020, Pages 823–838https://doi.org/10.1145/3385412.3385963This paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping,...
- research-articleJune 2018
The three pillars of machine programming
- Justin Gottschlich,
- Armando Solar-Lezama,
- Nesime Tatbul,
- Michael Carbin,
- Martin Rinard,
- Regina Barzilay,
- Saman Amarasinghe,
- Joshua B. Tenenbaum,
- Tim Mattson
MAPL 2018: Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming LanguagesJune 2018, Pages 69–80https://doi.org/10.1145/3211346.3211355In this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research. Those pillars are: (i) intention, (ii) invention, and (iii) adaptation. Intention emphasizes advancements ...
- research-articleJune 2015
Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code
- Charith Mendis,
- Jeffrey Bosboom,
- Kevin Wu,
- Shoaib Kamil,
- Jonathan Ragan-Kelley,
- Sylvain Paris,
- Qin Zhao,
- Saman Amarasinghe
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2015, Pages 391–402https://doi.org/10.1145/2737924.2737974Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 ...
Also Published in:
ACM SIGPLAN Notices: Volume 50 Issue 6, June 2015 - research-articleJune 2015
Autotuning algorithmic choice for input sensitivity
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2015, Pages 379–390https://doi.org/10.1145/2737924.2737969A daunting challenge faced by program performance autotuning is input sensitivity, where the best autotuned configuration may vary with different input sets. This paper presents a novel two-level input learning algorithm to tackle the challenge for an ...
Also Published in:
ACM SIGPLAN Notices: Volume 50 Issue 6, June 2015 - research-articleJune 2013
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2013, Pages 519–530https://doi.org/10.1145/2491956.2462176Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. ...
Also Published in:
ACM SIGPLAN Notices: Volume 48 Issue 6, June 2013 - research-articleJune 2009
PetaBricks: a language and compiler for algorithmic choice
PLDI '09: Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2009, Pages 38–49https://doi.org/10.1145/1542476.1542481It is often impossible to obtain a one-size-fits-all solution for high performance algorithms when considering different choices for data distributions, parallelism, transformations, and blocking. The best solution to these choices is often tightly ...
Also Published in:
ACM SIGPLAN Notices: Volume 44 Issue 6, June 2009 - proceedingJune 2008
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation
The Program Committee and I are pleased to present the proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation. This volume contains 34 papers selected out of 184 submissions. We believe this volume demonstrates ...
- ArticleMay 2003
Meta optimization: improving compiler heuristics with machine learning
PLDI '03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementationJune 2003, Pages 77–90https://doi.org/10.1145/781131.781141Compiler writers have crafted many heuristics over the years to approximately solve NP-hard problems efficiently. Finding a heuristic that performs well on a broad range of applications is a tedious and difficult process. This paper introduces Meta ...
Also Published in:
ACM SIGPLAN Notices: Volume 38 Issue 5, May 2003 - ArticleMay 2003
Linear analysis and optimization of stream programs
PLDI '03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementationJune 2003, Pages 12–25https://doi.org/10.1145/781131.781134As more complex DSP algorithms are realized in practice, there is an increasing need for high-level stream abstractions that can be compiled without sacrificing efficiency. Toward this end, we present a set of aggressive optimizations that target linear ...
Also Published in:
ACM SIGPLAN Notices: Volume 38 Issue 5, May 2003 - ArticleMay 2001
A unified framework for schedule and storage optimization
PLDI '01: Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementationJune 2001, Pages 232–242https://doi.org/10.1145/378795.378852We present a unified mathematical framework for analyzing the tradeoffs between parallelism and storage allocation within a parallelizing compiler. Using this framework, we show how to find a good storage mapping for a given schedule, a good schedule ...
Also Published in:
ACM SIGPLAN Notices: Volume 36 Issue 5, May 2001 - ArticleMay 2000
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementationAugust 2000, Pages 145–156https://doi.org/10.1145/349299.349320Increasing focus on multimedia applications has prompted the addition of multimedia extensions to most existing general purpose microprocessors. This added functionality comes primarily with the addition of short SIMD instructions. Unfortunately, ...
Also Published in:
ACM SIGPLAN Notices: Volume 35 Issue 5, May 2000 - ArticleMay 2000
Bidwidth analysis with application to silicon compilation
PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementationAugust 2000, Pages 108–120https://doi.org/10.1145/349299.349317This paper introduces Bitwise, a compiler that minimizes the bitwidth the number of bits used to represent each operand for both integers and pointers in a program. By propagating 70 static information both forward and backward in the program dataflow ...
Also Published in:
ACM SIGPLAN Notices: Volume 35 Issue 5, May 2000 - ArticleJune 1993
Communication optimization and code generation for distributed memory machines
PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementationAugust 1993, Pages 126–138https://doi.org/10.1145/155090.155102This paper presents several algorithms to solve code generation and optimization problems specific to machines with distributed address spaces. Given a description of how the computation is to be partitioned across the processors in a machine, our ...
Also Published in:
ACM SIGPLAN Notices: Volume 28 Issue 6, June 1993