research-article

Open access

Practical Iterative Optimization for the Data Center

Authors:

Lieven Eeckhout,

Xiaobing FengAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 12, Issue 2

Article No.: 15, Pages 1 - 26

https://doi.org/10.1145/2739048

Published: 11 May 2015 Publication History

Abstract

Iterative optimization is a simple but powerful approach that searches the best possible combination of compiler optimizations for a given workload. However, iterative optimization is plagued by several practical issues that prevent it from being widely used in practice: a large number of runs are required to find the best combination, the optimum combination is dataset dependent, and the exploration process incurs significant overhead that needs to be compensated for by performance benefits. Therefore, although iterative optimization has been shown to have a significant performance potential, it seldom is used in production compilers.

In this article, we propose iterative optimization for the data center (IODC): we show that the data center offers a context in which all of the preceding hurdles can be overcome. The basic idea is to spawn different combinations across workers and recollect performance statistics at the master, which then evolves to the optimum combination of compiler optimizations. IODC carefully manages costs and benefits, and it is transparent to the end user. To bring IODC to practice, we evaluate it in the presence of co-runners to better reflect real-life data center operation with multiple applications co-running per server. We enhance IODC with the capability to find compatible co-runners along with a mechanism to dynamically adjust the level of aggressiveness to improve its robustness in the presence of co-running applications.

We evaluate IODC using both MapReduce and compute-intensive throughput server applications. To reflect the large number of users interacting with the system, we gather a very large collection of datasets (up to hundreds of millions of unique datasets per program), for a total storage of 16.4TB and 850 days of CPU time. We report an average performance improvement of 1.48 × and up to 2.08 × for five MapReduce applications, and 1.12 × and up to 1.39 × for nine server applications. Furthermore, our experiments demonstrate that IODC is effective in the presence of co-runners, improving performance by greater than 13% compared to the worst possible co-runner schedule.

References

[1]

F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O’Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. 2006. Using machine learning to focus iterative optimization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). 295--305.

Digital Library

[2]

N. Aghdaie and Y. Tamir. 2002. Implementation and evaluation of transparent fault-tolerant Web service with kernel-level support. In Proceedings of 11th International Conference on Computer Communications and Networks (ICCCN’02). IEEE, Los Alamitos, CA, 63--68.

[3]

R. Agrawal and R. Srikant. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of 20th International Conference on Very Large Data Bases (VLDB’94). 487--499.

Digital Library

[4]

P. Berube and J. N. Amaral. 2006. Aestimo: A feedback-directed optimization evaluation tool. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’06). 251--260.

[5]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 72--81.

Digital Library

[6]

J. Cavazos, G. Fursin, F. Agakov, E. Bonilla, M. F. P. O’Boyle, and O. Temam. 2007. Rapidly selecting good compiler optimizations using performance counters. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’07). 185--197.

Digital Library

[7]

Y. Chen, S. Fang, L. Eeckhout, O. Temam, and C. Wu. 2012. Iterative optimization for the data center. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 49--60.

Digital Library

[8]

Y. Chen, Y. Huang, L. Eeckhout, G. Fursin, L. Peng, O. Temam, and C. Wu. 2010. Evaluating iterative optimization across 1000 data sets. In Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI’10). 448--459.

Digital Library

[9]

K. D. Cooper, A. Grosul, T. J. Harvey, S. Reeves, D. Subramanian, L. Torczon, and T. Waterman. 2005. ACME: Adaptive compilation made efficient. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’05). 69--77.

Digital Library

[10]

K. D. Cooper, P. J. Schielke, and D. Subramanian. 1999. Optimizing for reduced code space using genetic algorithms. In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’99). 1--9.

Digital Library

[11]

A. S. Das, M. Datar, A. Garg, and S. Rajaram. 2007. Google news personalization: Scalable online collaborative filtering. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). 271--280.

Digital Library

[12]

J. Dean and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI’04). 107--113.

Digital Library

[13]

T. Dwyer, A. Fedorova, S. Blagodurov, M. Roth, F. Gaud, and J. Pei. 2012. A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). IEEE, Los Alamitos, CA, 1--11.

Digital Library

[14]

B. Franke, M. O’Boyle, J. Thomson, and G. Fursin. 2005. Probabilistic source-level optimisation of embedded programs. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’05). 78--86.

Digital Library

[15]

G. Fursin, J. Cavazos, M. O’Boyle, and O. Temam. 2007. MiDataSets: Creating the conditions for a more realistic evaluation of iterative optimization. In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’07). 245--260.

Digital Library

[16]

G. Fursin and O. Temam. 2009. Collective optimization. In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’09). 34--49.

Digital Library

[17]

D. Gillick, A. Faria, and J. DeNero. 2006. MapReduce: Distributed Computing for Machine Learning. Technical Report 1--12. University of California, Berkeley, CA.

[18]

Y. Gu and R. L. Grossman. 2009. Sector and sphere: The design and implementation of a high performance data cloud. Theme Issue of the Philosophical Transactions of the Royal Society A: Crossing Boundaries: Computational Science, E-Science and Global E-Infrastructure. 367, 1897, 2429--2445.

[19]

M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual International Workshop on Workload Characterization (WWC’01). 3--14.

Digital Library

[20]

Hadoop. 2014. Apache Hadoop Home Page. Retrieved April 6, 2015, from http://hadoop.apache.org.

[21]

M. Haneda, P. M. W. Knijnenburg, and H. A. G. Wijshoff. 2006. On the impact of data input sets on statistical compiler tuning. In Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS’06). 385--385.

Digital Library

[22]

K. Hoste and L. Eeckhout. 2008. Cole: Compiler optimization level exploration. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’08). 165--174.

Digital Library

[23]

M. Kambadur, T. Moseley, R. Hank, and M. A. Kim. 2012. Measuring interference between live datacenter applications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). 1--12.

Digital Library

[24]

P. Kulkarni, S. Hines, J. Hiser, D. Whalley, J. Davidson, and D. Jones. 2004. Fast searches for effective optimization phase sequences. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’04). 171--182.

Digital Library

[25]

S. Lim, J. Huh, Y. Kim, G. M. Shipman, and C. R. Das. 2012. D-factor: A quantitative model of application slow-down in multi-resource shared systems. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’12). 271--282.

Digital Library

[26]

Z. Liu, H. Li, and G. Miao. 2010. MapReduce-based backpropagation neural network over large scale mobile data. In Proceedings of the 6th International Conference on Natural Computation (ICNC’10). 1726--1730.

[27]

Loongson. 2014. Loongson 2F. Retrieved April 6, 2015, from http://www.loongson.cn/.

[28]

J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. 2011. Bubble-Up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the International Symposium on Microarchitecture (MICRO’11). 248--259.

Digital Library

[29]

J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. 2010. Contention aware execution: Online contention detection and response. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’10). 257--265.

Digital Library

[30]

M. Marwah, S. Mishra, and C. Fetzer. 2008. Enhanced server fault-tolerance for improved user experience. In Proceedings of the IEEE International Conference on Dependable Systems and Networks with FTCS and DCC (DSN’08). 167--176.

[31]

MovieLens. 2014. MovieLens Data Sets. Retrieved April 6, 2015, from http://www.grouplens.org/node/73.

[32]

M. Nawaz, E. E. Enscore Jr., and I. Ham. 1983. A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem. OMEGA 11, 1, 91--95.

[33]

Z. Pan and R. Eigenmann. 2006. Fast and effective orchestration of compiler optimizations for automatic performance tuning. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). 319--332.

Digital Library

[34]

PAPI 5.1. 2013. PAPI: Performance Application Programming Interface. Retrieved April 6, 2015, from http://icl.cs.utk.edu/papi.

[35]

S. Patil and D. J. Lilja. 2010. Using resampling techniques to compute confidence intervals for the harmonic mean of rate-based performance metrics. Computer Architecture Letters 9, 1, 1--4.

Digital Library

[36]

A. Qasem, K. Kennedy, and J. Mellor-Crummey. 2006. Automatic tuning of whole applications using direct search and a performance-based transformation system. Journal of Supercomputing 36, 2, 183--196.

Digital Library

[37]

C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. 2007. Evaluating MapReduce for multi-core and multiprocessor systems. In Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07). 13--24.

Digital Library

[38]

M. Stephenson. 2006. Automating the Construction of Compiler Heuristics Using Machine Learning. Ph.D. Dissertation. Massachusetts Institute of Technology, Cambridge, MA.

Digital Library

[39]

M. Stephenson and S. Amarasinghe. 2005. Predicting unroll factors using supervised classification. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’05). 123--134.

Digital Library

[40]

M. Stephenson, M. Martin, and U. M. O’Reilly. 2003. Meta optimization: Improving compiler heuristics with machine learning. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’03). 77--90.

Digital Library

[41]

L. Tang, J. Mars, and M. L. Soffa. 2011. Contentiousness vs. sensitivity: Improving contention aware runtime systems on multicore architectures. In Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era (EXADAPT’11). 12--21.

Digital Library

[42]

A. Verma, X. Llora, D. E. Goldberg, and R. H. Campbell. 2009. Scaling genetic algorithms using MapReduce. In Proceedings of the 9th International Conference on Intelligent Systems Design and Applications (ISDA’09). 13--18.

Digital Library

[43]

M. J. Voss and R. Eigenmann. 2000. ADAPT: Automated de-coupled adaptive program transformation. In Proceedings of the International Conference on Parallel Processing (ICPP’00). 163--170.

Digital Library

Cited By

Chang RHoang T(2023)Constructing an AI Compiler for ARM Cortex-M DevicesComputer Systems Science and Engineering10.32604/csse.2023.03467246:1(999-1019)Online publication date: 2023
https://doi.org/10.32604/csse.2023.034672
Chen JSuo CJiang JChen PLi XGrundy JPollock LPenta M(2023)Compiler Test-Program Generation via Memoized Configuration SearchProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00172(2035-2047)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00172
Liu HXu JChen SGuo T(2022)Compiler Optimization Parameter Selection Method Based on Ensemble LearningElectronics10.3390/electronics1115245211:15(2452)Online publication date: 6-Aug-2022
https://doi.org/10.3390/electronics11152452
Show More Cited By

Index Terms

Practical Iterative Optimization for the Data Center
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Iterative optimization for the data center
ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems

Iterative optimization is a simple but powerful approach that searches for the best possible combination of compiler optimizations for a given workload. However, each program, if not each data set, potentially favors a different combination. As a result,...
Deconstructing iterative optimization

Iterative optimization is a popular compiler optimization approach that has been studied extensively over the past decade. In this article, we deconstruct iterative optimization by evaluating whether it works across datasets and by analyzing why it ...
Iterative optimization for the data center
ASPLOS '12

Iterative optimization is a simple but powerful approach that searches for the best possible combination of compiler optimizations for a given workload. However, each program, if not each data set, potentially favors a different combination. As a result,...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 12, Issue 2

July 2015

410 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/2775085

Editor:
Koen De Bosschere
Ghent University

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 May 2015

Accepted: 01 February 2015

Revised: 01 January 2015

Received: 01 October 2014

Published in TACO Volume 12, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Google Faculty Research Award
Strategic Priority Research Program of CAS
NSFC
Intel Collaborative Research Institute for Computational Intelligence (ICRI-CI)
China 1000-talents program
International Collaboration Key Program of CAS
National High Technology Research andDevelopment Program of China
China 10000-talents program
European Research Council under the European Community's Seventh Framework Programme (FP7/2007-2013) / ERC
National High Technology Research and Development Program of China
National Basic Research Program of China
National Natural Science Foundation of China (NSFC)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
583
Total Downloads

Downloads (Last 12 months)75
Downloads (Last 6 weeks)11

Reflects downloads up to 24 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chang RHoang T(2023)Constructing an AI Compiler for ARM Cortex-M DevicesComputer Systems Science and Engineering10.32604/csse.2023.03467246:1(999-1019)Online publication date: 2023
https://doi.org/10.32604/csse.2023.034672
Chen JSuo CJiang JChen PLi XGrundy JPollock LPenta M(2023)Compiler Test-Program Generation via Memoized Configuration SearchProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00172(2035-2047)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00172
Liu HXu JChen SGuo T(2022)Compiler Optimization Parameter Selection Method Based on Ensemble LearningElectronics10.3390/electronics1115245211:15(2452)Online publication date: 6-Aug-2022
https://doi.org/10.3390/electronics11152452
Chen JSuo C(2022)Boosting Compiler Testing via Compiler Optimization ExplorationACM Transactions on Software Engineering and Methodology10.1145/350836231:4(1-33)Online publication date: 22-Aug-2022
https://dl.acm.org/doi/10.1145/3508362
Faustino ABorin EPereira FNápoli ORosário V(2021)New Optimization Sequences for Code-Size Reduction for the LLVM Compilation InfrastructureProceedings of the 25th Brazilian Symposium on Programming Languages10.1145/3475061.3475085(33-40)Online publication date: 27-Sep-2021
https://dl.acm.org/doi/10.1145/3475061.3475085
Wu WZhu QWang FQi F(2021)CTFS: A Configurable Tuning Framework on SHENWEI SystemProceedings of the 5th International Conference on High Performance Compilation, Computing and Communications10.1145/3471274.3471282(44-49)Online publication date: 18-Jun-2021
https://dl.acm.org/doi/10.1145/3471274.3471282
Silva Ade Lima BPereira FSmith ADemange DGupta R(2021)Exploring the space of optimization sequences for code-size reduction: insights and toolsProceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction10.1145/3446804.3446849(47-58)Online publication date: 2-Mar-2021
https://dl.acm.org/doi/10.1145/3446804.3446849
WU WZHU QWang F(2021)Compiler Autotuning Based on Hot Function for SHENWEI Processor2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS52626.2021.9449274(1059-1062)Online publication date: 23-Apr-2021
https://doi.org/10.1109/ICCCS52626.2021.9449274
Sachan AGhoshal B(2021)Learning based compilation of embedded applications targeting minimal energy consumption▪Journal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102116116:COnline publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1016/j.sysarc.2021.102116
Ahmed HIsmail M(2021)Toward a novel engine for compiler optimization space exploration of big data workloadsSoftware: Practice and Experience10.1002/spe.306552:5(1262-1293)Online publication date: 24-Dec-2021
https://doi.org/10.1002/spe.3065
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents