skip to main content
10.1145/2786805.2786875acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Mimic: computing models for opaque code

Published: 30 August 2015 Publication History

Abstract

Opaque code, which is executable but whose source is unavailable or hard to process, can be problematic in a number of scenarios, such as program analysis. Manual construction of models is often used to handle opaque code, but this process is tedious and error-prone. (In this paper, we use model to mean a representation of a piece of code suitable for program analysis.) We present a novel technique for automatic generation of models for opaque code, based on program synthesis. The technique intercepts memory accesses from the opaque code to client objects, and uses this information to construct partial execution traces. Then, it performs a heuristic search inspired by Markov Chain Monte Carlo techniques to discover an executable code model whose behavior matches the opaque code. Native execution, parallelization, and a carefully-designed fitness function are leveraged to increase the effectiveness of the search. We have implemented our technique in a tool Mimic for discovering models of opaque JavaScript functions, and used Mimic to synthesize correct models for a variety of array-manipulating routines.

References

[1]
C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. An introduction to MCMC for machine learning. Machine Learning, 50(1-2):5–43, 2003.
[2]
O. Bastani, S. Anand, and A. Aiken. Specification inference using context-free language reachability. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015, pages 553–566, 2015.
[3]
A. W. Biermann, R. I. Baum, and F. E. Petry. Speeding up the synthesis of programs from traces. IEEE Trans. Comput., 24(2):122–136, Feb. 1975.
[4]
S. Forrest, T. Nguyen, W. Weimer, and C. Le Goues. A genetic programming approach to automated software repair. In Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO ’09, pages 947–954, New York, NY, USA, 2009. ACM.
[5]
E. M. Gold. Language identification in the limit. Information and Control, 10, 1967.
[6]
S. Gulwani, W. R. Harris, and R. Singh. Spreadsheet data manipulation using examples. Commun. ACM, 55(8):97–105, 2012.
[7]
S. H. Jensen, M. Madsen, and A. Møller. Modeling the HTML DOM and browser API in static analysis of javascript web applications. In SIGSOFT/FSE’11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC’11: 13rd European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011, pages 59–69, 2011.
[8]
S. Jha, S. Gulwani, S. A. Seshia, and A. Tiwari. Oracle-guided component-based program synthesis. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE 2010, Cape Town, South Africa, 1-8 May 2010, pages 215–224, 2010.
[9]
V. Kuncak, M. Mayer, R. Piskac, and P. Suter. Complete functional synthesis. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’10, pages 316–329, New York, NY, USA, 2010. ACM.
[10]
T. A. Lau, P. Domingos, and D. S. Weld. Learning programs from traces using version space algebra. In Proceedings of the 2nd International Conference on Knowledge Capture (K-CAP 2003), October 23-25, 2003, Sanibel Island, FL, USA, pages 36–43, 2003.
[11]
M. Madsen, B. Livshits, and M. Fanning. Practical static analysis of javascript applications in the presence of frameworks and libraries. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013, pages 499–509, 2013.
[12]
M. D. Network. Array.prototype. https://developer. mozilla.org/en-US/docs/Web/JavaScript/ Reference/Global_Objects/Array/prototype. Accessed: 2015-03-15.
[13]
M. D. Network. Proxy - JavaScript. https://developer.mozilla.org/en-US/docs/Web/ JavaScript/Reference/Global_Objects/Proxy. Accessed: 2015-03-16.
[14]
V. K. Palepu, G. H. Xu, and J. A. Jones. Improving efficiency of dynamic analysis with dynamic dependence summaries. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013, Silicon Valley, CA, USA, November 11-15, 2013, pages 59–69, 2013.
[15]
D. Qi, W. N. Sumner, F. Qin, M. Zheng, X. Zhang, and A. Roychoudhury. Modeling software execution environment. In Working Conference on Reverse Engineering, WCRE’12, pages 415–424, Washington, DC, USA, 2012. IEEE Computer Society.
[16]
E. Schkufza, R. Sharma, and A. Aiken. Stochastic superoptimization. In V. Sarkar and R. Bod´ık, editors, ASPLOS, pages 305–316. ACM, 2013.
[17]
M. Sharir and A. Pnueli. Two approaches to interprocedural data flow analysis. Program Flow Analysis: Theory and Applications, 1981.
[18]
A. Solar-Lezama, L. Tancau, R. Bodik, S. Seshia, and V. Saraswat. Combinatorial sketching for finite programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XII, pages 404–415, New York, NY, USA, 2006. ACM.
[19]
M. Sridharan, S. Artzi, M. Pistoia, S. Guarnieri, O. Tripp, and R. Berg. F4F: taint analysis of framework-based web applications. In Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2011, part of SPLASH 2011, Portland, OR, USA, October 22 - 27, 2011, pages 1053–1068, 2011.
[20]
T.J. Watson Libraries for Analysis (WALA). http://wala.sourceforge.net.
[21]
H. Zhu, T. Dillig, and I. Dillig. Automated inference of library specifications for source-sink property verification. In Programming Languages and Systems - 11th Asian Symposium, APLAS 2013, Melbourne, VIC, Australia, December 9-11, 2013. Proceedings, pages 290–306, 2013.

Cited By

View all
  • (2024)Programming-by-Demonstration for Long-Horizon Robot TasksProceedings of the ACM on Programming Languages10.1145/36328608:POPL(512-545)Online publication date: 5-Jan-2024
  • (2023)The role of program analysis in security vulnerability detection: Then and nowComputers & Security10.1016/j.cose.2023.103463135(103463)Online publication date: Dec-2023
  • (2022)Automatically deriving JavaScript static analyzers from specifications using Meta-level static analysisProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549097(1022-1034)Online publication date: 7-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering
August 2015
1068 pages
ISBN:9781450336758
DOI:10.1145/2786805
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 August 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. JavaScript
  2. MCMC
  3. Opaque code
  4. model generation
  5. program synthesis

Qualifiers

  • Research-article

Conference

ESEC/FSE'15
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)3
Reflects downloads up to 04 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Programming-by-Demonstration for Long-Horizon Robot TasksProceedings of the ACM on Programming Languages10.1145/36328608:POPL(512-545)Online publication date: 5-Jan-2024
  • (2023)The role of program analysis in security vulnerability detection: Then and nowComputers & Security10.1016/j.cose.2023.103463135(103463)Online publication date: Dec-2023
  • (2022)Automatically deriving JavaScript static analyzers from specifications using Meta-level static analysisProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549097(1022-1034)Online publication date: 7-Nov-2022
  • (2022)Type-directed program synthesis for RESTful APIsProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523450(122-136)Online publication date: 9-Jun-2022
  • (2022)Bootstrapping Library-Based SynthesisStatic Analysis10.1007/978-3-031-22308-2_13(272-298)Online publication date: 2-Dec-2022
  • (2021)Accelerating JavaScript static analysis via dynamic shortcutsProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468556(1129-1140)Online publication date: 20-Aug-2021
  • (2021)A Survey of Parametric Static AnalysisACM Computing Surveys10.1145/346445754:7(1-37)Online publication date: 18-Jul-2021
  • (2021)Adaptive restarts for stochastic synthesisProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454071(696-709)Online publication date: 19-Jun-2021
  • (2021)Active Learning for Inference and Regeneration of Applications that Access DatabasesACM Transactions on Programming Languages and Systems10.1145/343095242:4(1-119)Online publication date: 22-Jan-2021
  • (2020)Extracting taint specifications for JavaScript librariesProceedings of the ACM/IEEE 42nd International Conference on Software Engineering10.1145/3377811.3380390(198-209)Online publication date: 27-Jun-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media