skip to main content
research-article
Open access

Practical Iterative Optimization for the Data Center

Published: 11 May 2015 Publication History

Abstract

Iterative optimization is a simple but powerful approach that searches the best possible combination of compiler optimizations for a given workload. However, iterative optimization is plagued by several practical issues that prevent it from being widely used in practice: a large number of runs are required to find the best combination, the optimum combination is dataset dependent, and the exploration process incurs significant overhead that needs to be compensated for by performance benefits. Therefore, although iterative optimization has been shown to have a significant performance potential, it seldom is used in production compilers.
In this article, we propose iterative optimization for the data center (IODC): we show that the data center offers a context in which all of the preceding hurdles can be overcome. The basic idea is to spawn different combinations across workers and recollect performance statistics at the master, which then evolves to the optimum combination of compiler optimizations. IODC carefully manages costs and benefits, and it is transparent to the end user. To bring IODC to practice, we evaluate it in the presence of co-runners to better reflect real-life data center operation with multiple applications co-running per server. We enhance IODC with the capability to find compatible co-runners along with a mechanism to dynamically adjust the level of aggressiveness to improve its robustness in the presence of co-running applications.
We evaluate IODC using both MapReduce and compute-intensive throughput server applications. To reflect the large number of users interacting with the system, we gather a very large collection of datasets (up to hundreds of millions of unique datasets per program), for a total storage of 16.4TB and 850 days of CPU time. We report an average performance improvement of 1.48 × and up to 2.08 × for five MapReduce applications, and 1.12 × and up to 1.39 × for nine server applications. Furthermore, our experiments demonstrate that IODC is effective in the presence of co-runners, improving performance by greater than 13% compared to the worst possible co-runner schedule.

References

[1]
F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O’Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. 2006. Using machine learning to focus iterative optimization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). 295--305.
[2]
N. Aghdaie and Y. Tamir. 2002. Implementation and evaluation of transparent fault-tolerant Web service with kernel-level support. In Proceedings of 11th International Conference on Computer Communications and Networks (ICCCN’02). IEEE, Los Alamitos, CA, 63--68.
[3]
R. Agrawal and R. Srikant. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of 20th International Conference on Very Large Data Bases (VLDB’94). 487--499.
[4]
P. Berube and J. N. Amaral. 2006. Aestimo: A feedback-directed optimization evaluation tool. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’06). 251--260.
[5]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 72--81.
[6]
J. Cavazos, G. Fursin, F. Agakov, E. Bonilla, M. F. P. O’Boyle, and O. Temam. 2007. Rapidly selecting good compiler optimizations using performance counters. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’07). 185--197.
[7]
Y. Chen, S. Fang, L. Eeckhout, O. Temam, and C. Wu. 2012. Iterative optimization for the data center. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 49--60.
[8]
Y. Chen, Y. Huang, L. Eeckhout, G. Fursin, L. Peng, O. Temam, and C. Wu. 2010. Evaluating iterative optimization across 1000 data sets. In Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI’10). 448--459.
[9]
K. D. Cooper, A. Grosul, T. J. Harvey, S. Reeves, D. Subramanian, L. Torczon, and T. Waterman. 2005. ACME: Adaptive compilation made efficient. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’05). 69--77.
[10]
K. D. Cooper, P. J. Schielke, and D. Subramanian. 1999. Optimizing for reduced code space using genetic algorithms. In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’99). 1--9.
[11]
A. S. Das, M. Datar, A. Garg, and S. Rajaram. 2007. Google news personalization: Scalable online collaborative filtering. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). 271--280.
[12]
J. Dean and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI’04). 107--113.
[13]
T. Dwyer, A. Fedorova, S. Blagodurov, M. Roth, F. Gaud, and J. Pei. 2012. A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). IEEE, Los Alamitos, CA, 1--11.
[14]
B. Franke, M. O’Boyle, J. Thomson, and G. Fursin. 2005. Probabilistic source-level optimisation of embedded programs. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’05). 78--86.
[15]
G. Fursin, J. Cavazos, M. O’Boyle, and O. Temam. 2007. MiDataSets: Creating the conditions for a more realistic evaluation of iterative optimization. In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’07). 245--260.
[16]
G. Fursin and O. Temam. 2009. Collective optimization. In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’09). 34--49.
[17]
D. Gillick, A. Faria, and J. DeNero. 2006. MapReduce: Distributed Computing for Machine Learning. Technical Report 1--12. University of California, Berkeley, CA.
[18]
Y. Gu and R. L. Grossman. 2009. Sector and sphere: The design and implementation of a high performance data cloud. Theme Issue of the Philosophical Transactions of the Royal Society A: Crossing Boundaries: Computational Science, E-Science and Global E-Infrastructure. 367, 1897, 2429--2445.
[19]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual International Workshop on Workload Characterization (WWC’01). 3--14.
[20]
Hadoop. 2014. Apache Hadoop Home Page. Retrieved April 6, 2015, from http://hadoop.apache.org.
[21]
M. Haneda, P. M. W. Knijnenburg, and H. A. G. Wijshoff. 2006. On the impact of data input sets on statistical compiler tuning. In Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS’06). 385--385.
[22]
K. Hoste and L. Eeckhout. 2008. Cole: Compiler optimization level exploration. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’08). 165--174.
[23]
M. Kambadur, T. Moseley, R. Hank, and M. A. Kim. 2012. Measuring interference between live datacenter applications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). 1--12.
[24]
P. Kulkarni, S. Hines, J. Hiser, D. Whalley, J. Davidson, and D. Jones. 2004. Fast searches for effective optimization phase sequences. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’04). 171--182.
[25]
S. Lim, J. Huh, Y. Kim, G. M. Shipman, and C. R. Das. 2012. D-factor: A quantitative model of application slow-down in multi-resource shared systems. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’12). 271--282.
[26]
Z. Liu, H. Li, and G. Miao. 2010. MapReduce-based backpropagation neural network over large scale mobile data. In Proceedings of the 6th International Conference on Natural Computation (ICNC’10). 1726--1730.
[27]
Loongson. 2014. Loongson 2F. Retrieved April 6, 2015, from http://www.loongson.cn/.
[28]
J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. 2011. Bubble-Up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the International Symposium on Microarchitecture (MICRO’11). 248--259.
[29]
J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. 2010. Contention aware execution: Online contention detection and response. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’10). 257--265.
[30]
M. Marwah, S. Mishra, and C. Fetzer. 2008. Enhanced server fault-tolerance for improved user experience. In Proceedings of the IEEE International Conference on Dependable Systems and Networks with FTCS and DCC (DSN’08). 167--176.
[31]
MovieLens. 2014. MovieLens Data Sets. Retrieved April 6, 2015, from http://www.grouplens.org/node/73.
[32]
M. Nawaz, E. E. Enscore Jr., and I. Ham. 1983. A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem. OMEGA 11, 1, 91--95.
[33]
Z. Pan and R. Eigenmann. 2006. Fast and effective orchestration of compiler optimizations for automatic performance tuning. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). 319--332.
[34]
PAPI 5.1. 2013. PAPI: Performance Application Programming Interface. Retrieved April 6, 2015, from http://icl.cs.utk.edu/papi.
[35]
S. Patil and D. J. Lilja. 2010. Using resampling techniques to compute confidence intervals for the harmonic mean of rate-based performance metrics. Computer Architecture Letters 9, 1, 1--4.
[36]
A. Qasem, K. Kennedy, and J. Mellor-Crummey. 2006. Automatic tuning of whole applications using direct search and a performance-based transformation system. Journal of Supercomputing 36, 2, 183--196.
[37]
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. 2007. Evaluating MapReduce for multi-core and multiprocessor systems. In Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07). 13--24.
[38]
M. Stephenson. 2006. Automating the Construction of Compiler Heuristics Using Machine Learning. Ph.D. Dissertation. Massachusetts Institute of Technology, Cambridge, MA.
[39]
M. Stephenson and S. Amarasinghe. 2005. Predicting unroll factors using supervised classification. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’05). 123--134.
[40]
M. Stephenson, M. Martin, and U. M. O’Reilly. 2003. Meta optimization: Improving compiler heuristics with machine learning. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’03). 77--90.
[41]
L. Tang, J. Mars, and M. L. Soffa. 2011. Contentiousness vs. sensitivity: Improving contention aware runtime systems on multicore architectures. In Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era (EXADAPT’11). 12--21.
[42]
A. Verma, X. Llora, D. E. Goldberg, and R. H. Campbell. 2009. Scaling genetic algorithms using MapReduce. In Proceedings of the 9th International Conference on Intelligent Systems Design and Applications (ISDA’09). 13--18.
[43]
M. J. Voss and R. Eigenmann. 2000. ADAPT: Automated de-coupled adaptive program transformation. In Proceedings of the International Conference on Parallel Processing (ICPP’00). 163--170.

Cited By

View all
  • (2023)Constructing an AI Compiler for ARM Cortex-M DevicesComputer Systems Science and Engineering10.32604/csse.2023.03467246:1(999-1019)Online publication date: 2023
  • (2023)Compiler Test-Program Generation via Memoized Configuration SearchProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00172(2035-2047)Online publication date: 14-May-2023
  • (2022)Compiler Optimization Parameter Selection Method Based on Ensemble LearningElectronics10.3390/electronics1115245211:15(2452)Online publication date: 6-Aug-2022
  • Show More Cited By

Index Terms

  1. Practical Iterative Optimization for the Data Center

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 12, Issue 2
    July 2015
    410 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/2775085
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 May 2015
    Accepted: 01 February 2015
    Revised: 01 January 2015
    Received: 01 October 2014
    Published in TACO Volume 12, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Iterative optimization
    2. MapReduce
    3. co-run
    4. compiler
    5. data center
    6. server

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Google Faculty Research Award
    • Strategic Priority Research Program of CAS
    • NSFC
    • Intel Collaborative Research Institute for Computational Intelligence (ICRI-CI)
    • China 1000-talents program
    • International Collaboration Key Program of CAS
    • National High Technology Research andDevelopment Program of China
    • China 10000-talents program
    • European Research Council under the European Community's Seventh Framework Programme (FP7/2007-2013) / ERC
    • National High Technology Research and Development Program of China
    • National Basic Research Program of China
    • National Natural Science Foundation of China (NSFC)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)75
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 24 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Constructing an AI Compiler for ARM Cortex-M DevicesComputer Systems Science and Engineering10.32604/csse.2023.03467246:1(999-1019)Online publication date: 2023
    • (2023)Compiler Test-Program Generation via Memoized Configuration SearchProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00172(2035-2047)Online publication date: 14-May-2023
    • (2022)Compiler Optimization Parameter Selection Method Based on Ensemble LearningElectronics10.3390/electronics1115245211:15(2452)Online publication date: 6-Aug-2022
    • (2022)Boosting Compiler Testing via Compiler Optimization ExplorationACM Transactions on Software Engineering and Methodology10.1145/350836231:4(1-33)Online publication date: 22-Aug-2022
    • (2021)New Optimization Sequences for Code-Size Reduction for the LLVM Compilation InfrastructureProceedings of the 25th Brazilian Symposium on Programming Languages10.1145/3475061.3475085(33-40)Online publication date: 27-Sep-2021
    • (2021)CTFS: A Configurable Tuning Framework on SHENWEI SystemProceedings of the 5th International Conference on High Performance Compilation, Computing and Communications10.1145/3471274.3471282(44-49)Online publication date: 18-Jun-2021
    • (2021)Exploring the space of optimization sequences for code-size reduction: insights and toolsProceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction10.1145/3446804.3446849(47-58)Online publication date: 2-Mar-2021
    • (2021)Compiler Autotuning Based on Hot Function for SHENWEI Processor2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS52626.2021.9449274(1059-1062)Online publication date: 23-Apr-2021
    • (2021)Learning based compilation of embedded applications targeting minimal energy consumption▪Journal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102116116:COnline publication date: 1-Jun-2021
    • (2021)Toward a novel engine for compiler optimization space exploration of big data workloadsSoftware: Practice and Experience10.1002/spe.306552:5(1262-1293)Online publication date: 24-Dec-2021
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media