skip to main content
10.1145/3589250.3596142acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Open access

RaceInjector: Injecting Races to Evaluate and Learn Dynamic Race Detection Algorithms

Published: 06 June 2023 Publication History

Abstract

There exist no sound, scalable methods to assemble comprehensive datasets of concurrent programs annotated with data races. As a consequence, it is unclear how well the multiple heuristics and SMT-based algorithms, that have been proposed over the last three decades to detect data races, perform. To address this problem, we propose —an SMT-based approach which, for any given program, creates arbitrarily many program traces of it containing injected data races. The injected races are guaranteed to follow the given program’s semantics. hence can produce an arbitrarily large, labeled benchmark which is independent of how detection algorithms work. We demonstrate by injecting races into popular program benchmarks and generating a small dataset of traces with races in them. Among the traces generates, we begin to find counterexamples which four state-of-the-art race detection algorithms fail to detect. We thus demonstrate the utility of generating such datasets, and recommend using them to train machine learning-based models which can potentially replace and improve upon existing race-detection heuristics.

References

[1]
[n. d.]. ASM bytecode analysis framework. https://asm.ow2.io/
[2]
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA ’06). Association for Computing Machinery, New York, NY, USA. 169–190. isbn:1595933484 https://doi.org/10.1145/1167473.1167488
[3]
Benjamin Bowman, Craig Laprade, Yuede Ji, and H Howie Huang. 2020. Detecting Lateral Movement in Enterprise Computer Networks with Unsupervised Graph AI. In RAID. 257–268.
[4]
Joshua Bundt, Andrew Fasano, Brendan Dolan-Gavitt, William Robertson, and Tim Leek. 2021. Evaluating Synthetic Bugs. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security (ASIA CCS ’21). Association for Computing Machinery, New York, NY, USA. 716–730. isbn:9781450382878 https://doi.org/10.1145/3433210.3453096
[5]
Chris Cummins, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Michael F P O’Boyle, and Hugh Leather. 2021. ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations. In Proceedings of the 38th International Conference on Machine Learning, Marina Meila and Tong Zhang (Eds.) (Proceedings of Machine Learning Research, Vol. 139). PMLR, 2244–2253. https://proceedings.mlr.press/v139/cummins21a.html
[6]
Dino Distefano, Manuel Fähndrich, Francesco Logozzo, and Peter W O’Hearn. 2019. Scaling static analyses at Facebook. Commun. ACM, 62, 8 (2019), 62–70.
[7]
Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: Efficient and Precise Dynamic Race Detection. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’09). Association for Computing Machinery, New York, NY, USA. 121–133. isbn:9781605583921 https://doi.org/10.1145/1542476.1542490
[8]
Cormac Flanagan and Stephen N. Freund. 2010. The RoadRunner Dynamic Analysis Framework for Concurrent Programs. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE ’10). Association for Computing Machinery, New York, NY, USA. 1–8. isbn:9781450300827 https://doi.org/10.1145/1806672.1806674
[9]
Jian Gao, Xin Yang, Yu Jiang, Han Liu, Weiliang Ying, and Xian Zhang. 2018. Jbench: A Dataset of Data Races for Concurrency Testing. In Proceedings of the 15th International Conference on Mining Software Repositories (MSR ’18). Association for Computing Machinery, New York, NY, USA. 6–9. isbn:9781450357166 https://doi.org/10.1145/3196398.3196451
[10]
Jeff Huang. 2015. Stateless Model Checking Concurrent Programs with Maximal Causality Reduction. SIGPLAN Not., 50, 6 (2015), jun, 165–174. issn:0362-1340 https://doi.org/10.1145/2813885.2737975
[11]
Jeff Huang, Patrick O’Neil Meredith, and Grigore Rosu. 2014. Maximal sound predictive race detection with control flow abstraction. In Proceedings of the 35th ACM SIGPLAN conference on programming language design and implementation. 337–348.
[12]
Jeff Huang, Patrick O’Neil Meredith, and Grigore Rosu. 2014. Maximal Sound Predictive Race Detection with Control Flow Abstraction. SIGPLAN Not., 49, 6 (2014), jun, 337–348. issn:0362-1340 https://doi.org/10.1145/2666356.2594315
[13]
Nicholas Jalbert, Cristiano Pereira, Gilles Pokam, and Koushik Sen. 2011. RADBench: A Concurrency Bug Benchmark Suite. In 3rd USENIX Workshop on Hot Topics in Parallelism (HotPar 11). USENIX Association, Berkeley, CA. https://www.usenix.org/conference/hotpar-11/radbench-concurrency-bug-benchmark-suite
[14]
Minseok Jeon, Myungho Lee, and Hakjoo Oh. 2020. Learning graph-based heuristics for pointer analysis without handcrafting application-specific features. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), 1–30.
[15]
Pallavi Joshi, Mayur Naik, Chang-Seo Park, and Koushik Sen. 2009. CalFuzzer: An Extensible Active Testing Framework for Concurrent Programs. In Computer Aided Verification, Ahmed Bouajjani and Oded Maler (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 675–681. isbn:978-3-642-02658-4
[16]
Christian Gram Kalhauge and Jens Palsberg. 2018. Sound Deadlock Prediction. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 146, oct, 29 pages. https://doi.org/10.1145/3276516
[17]
Dileep Kini, Umang Mathur, and Mahesh Viswanathan. 2017. Dynamic Race Prediction in Linear Time. SIGPLAN Not., 52, 6 (2017), jun, 157–170. issn:0362-1340 https://doi.org/10.1145/3140587.3062374
[18]
Lamport. 1979. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. Comput., C-28, 9 (1979), 690–691. https://doi.org/10.1109/TC.1979.1675439
[19]
Leslie Lamport. 1978. Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM, 21, 7 (1978), jul, 558–565. issn:0001-0782 https://doi.org/10.1145/359545.359563
[20]
Ziyi Lin, Darko Marinov, Hao Zhong, Yuting Chen, and Jianjun Zhao. 2015. JaConTeBe: A Benchmark Suite of Real-World Java Concurrency Bugs (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 178–189. https://doi.org/10.1109/ASE.2015.87
[21]
Yiling Lou, Qihao Zhu, Jinhao Dong, Xia Li, Zeyu Sun, Dan Hao, Lu Zhang, and Lingming Zhang. 2021. Boosting coverage-based fault localization via graph-based representation learning. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 664–676.
[22]
Umang Mathur, Dileep Kini, and Mahesh Viswanathan. 2018. What happens-after the first race? enhancing the predictive power of happens-before based dynamic race detection. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), 1–29.
[23]
Umang Mathur, Andreas Pavlogiannis, and Mahesh Viswanathan. 2020. The Complexity of Dynamic Data Race Prediction. In Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS ’20). Association for Computing Machinery, New York, NY, USA. 713–727. isbn:9781450371049 https://doi.org/10.1145/3373718.3394783
[24]
Umang Mathur, Andreas Pavlogiannis, and Mahesh Viswanathan. 2021. Optimal Prediction of Synchronization-Preserving Races. Proc. ACM Program. Lang., 5, POPL (2021), Article 36, jan, 29 pages. https://doi.org/10.1145/3434317
[25]
Jibesh Patra and Michael Pradel. 2021. Semantic Bug Seeding: A Learning-Based Approach for Creating Realistic Bugs. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA. 906–918. isbn:9781450385626 https://doi.org/10.1145/3468264.3468623
[26]
Jake Roemer, Kaan Genç, and Michael D Bond. 2020. SmartTrack: efficient predictive race detection. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 747–762.
[27]
Mahmoud Said, Chao Wang, Zijiang Yang, and Karem Sakallah. 2011. Generating Data Race Witnesses by an SMT-Based Analysis. In NASA Formal Methods, Mihaela Bobaru, Klaus Havelund, Gerard J. Holzmann, and Rajeev Joshi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 313–327. isbn:978-3-642-20398-5
[28]
Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, and Thomas Anderson. 1997. Eraser: A dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems (TOCS), 15, 4 (1997), 391–411.
[29]
Yannis Smaragdakis, Jacob Evans, Caitlin Sadowski, Jaeheon Yi, and Cormac Flanagan. 2012. Sound Predictive Race Detection in Polynomial Time. Association for Computing Machinery, New York, NY, USA. 387–400. isbn:9781450310833 https://doi.org/10.1145/2103656.2103702
[30]
Mosaad Al Thokair, Minjian Zhang, Umang Mathur, and Mahesh Viswanathan. 2023. Dynamic Race Detection with O(1) Samples. Proc. ACM Program. Lang., 7, POPL (2023), Article 45, jan, 30 pages. https://doi.org/10.1145/3571238
[31]
Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot - a Java Bytecode Optimization Framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research (CASCON ’99). IBM Press, 13.
[32]
Chao Wang, Sudipta Kundu, Malay Ganai, and Aarti Gupta. 2009. Symbolic predictive analysis for concurrent programs. In International Symposium on Formal Methods. 256–272.
[33]
Ting Yuan, Guangwei Li, Jie Lu, Chen Liu, Lian Li, and Jingling Xue. 2021. GoBench: A Benchmark Suite of Real-World Go Concurrency Bugs. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 187–199. https://doi.org/10.1109/CGO51591.2021.9370317

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SOAP 2023: Proceedings of the 12th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis
June 2023
70 pages
ISBN:9798400701702
DOI:10.1145/3589250
This work is licensed under a Creative Commons Attribution 4.0 International License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Dataset generation
  2. Dynamic race detection algorithms
  3. Race injec- tion
  4. SMT-solvers

Qualifiers

  • Research-article

Conference

SOAP '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 11 of 11 submissions, 100%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 238
    Total Downloads
  • Downloads (Last 12 months)137
  • Downloads (Last 6 weeks)23
Reflects downloads up to 04 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media