skip to main content
10.1109/ICSE48619.2023.00082acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

SemParser: A Semantic Parser for Log Analytics

Published: 26 July 2023 Publication History

Abstract

Logs, being run-time information automatically generated by software, record system events and activities with their timestamps. Before obtaining more insights into the run-time status of the software, a fundamental step of log analysis, called log parsing, is employed to extract structured templates and parameters from the semi-structured raw log messages. However, current log parsers are all syntax-based and regard each message as a character string, ignoring the semantic information included in parameters and templates.
Thus, we propose the first semantic-based parser SemParser to unlock the critical bottleneck of mining semantics from log messages. It contains two steps, an end-to-end semantics miner and a joint parser. Specifically, the first step aims to identify explicit semantics inside a single log, and the second step is responsible for jointly inferring implicit semantics and computing structural outputs according to the contextual knowledge base of the logs. To analyze the effectiveness of our semantic parser, we first demonstrate that it can derive rich semantics from log messages collected from six widely-applied systems with an average F1 score of 0.985. Then, we conduct two representative downstream tasks, showing that current downstream models improve their performance with appropriately extracted semantics by 1.2%-11.7% and 8.65% on two anomaly detection datasets and a failure identification dataset, respectively. We believe these findings provide insights into semantically understanding log messages for the log analysis community.

References

[1]
M. Chen, A. X. Zheng, J. Lloyd, M. I. Jordan, and E. Brewer, "Failure diagnosis using decision trees," in International Conference on Autonomic Computing, New York, NY, USA, May 17--19, 2004. IEEE Computer Society, 2004, pp. 36--43. [Online].
[2]
W. Xu, L. Huang, A. Fox, D. A. Patterson, and M. Jordan, "Large-scale system problems detection by mining console logs," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009--103, Jul 2009. [Online]. Available: http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-103.html
[3]
N. Zhao, H. Wang, Z. Li, X. Peng, G. Wang, Z. Pan, Y. Wu, Z. Feng, X. Wen, W. Zhang et al., "An empirical investigation of practical log anomaly detection for online service systems," in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021, pp. 1404--1415. [Online].
[4]
A. R. Chen, T.-H. P. Chen, and S. Wang, "Pathidea: Improving information retrieval-based bug localization by re-constructing execution paths using logs," IEEE Transactions on Software Engineering, 2021.
[5]
A. Amar and P. C. Rigby, "Mining historical test logs to predict bugs and localize faults in the test logs," in Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings, Montreal, QC, Canada, May 25--31, 2019. IEEE / ACM, 2019, pp. 140--151. [Online].
[6]
R. Vaarandi, "A data clustering algorithm for mining patterns from event logs," in Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IEEE Cat. No. 03EX764), Kansas City, MO, USA, Oct 3, 2003. IEEE, 2003, pp. 119--126. [Online]. Available: https://ieeexplore.ieee.org/document/1251233
[7]
M. Nagappan and M. A. Vouk, "Abstracting log lines to log event types for mining software system logs," in Proceedings of the 7th International Working Conference on Mining Software Repositories, Cape Town, South Africa, May 2--3, 2010, IEEE. IEEE Computer Society, 2010, pp. 114--117. [Online].
[8]
H. Dai, H. Li, C. S. Chen, W. Shang, and T.-H. Chen, "Logram: Efficient log parsing using n-gram dictionaries," CoRR, vol. abs/2001.03038, 2020. [Online]. Available: http://arxiv.org/abs/2001.03038
[9]
L. Tang, T. Li, and C.-S. Perng, "Logsig: Generating system events from raw textual logs," in Proceedings of the 20th Conference on Information and Knowledge Management, UK, October 24--28, 2011. ACM, 2011, pp. 785--794. [Online].
[10]
M. Mizutani, "Incremental mining of system log format," in International Conference on Services Computing, Santa Clara, CA, USA, June 28 - July 3, 2013. IEEE Computer Society, 2013, pp. 595--602. [Online].
[11]
G. Chu, J. Wang, Q. Qi, H. Sun, S. Tao, and J. Liao, "Prefix-graph: A versatile log parsing approach merging prefix tree with probabilistic graph," in 2021 IEEE 37th International Conference on Data Engineering. IEEE, 2021, pp. 2411--2422. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9458609
[12]
P. He, J. Zhu, Z. Zheng, and M. R. Lyu, "Drain: An online log parsing approach with fixed depth tree," in 2017 IEEE International Conference on Web Services, Honolulu, HI, USA, June 25--30, 2017. IEEE, 2017, pp. 33--40. [Online].
[13]
H. Li, T.-H. P. Chen, W. Shang, and A. E. Hassan, "Studying software logging using topic models," Empirical Software Engineering, vol. 23, no. 5, pp. 2655--2694, 2018. [Online].
[14]
X. Zhang, Y. Xu, Q. Lin, B. Qiao, H. Zhang, Y. Dang, C. Xie, X. Yang, Q. Cheng, Z. Li et al., "Robust log-based anomaly detection on unstable log data," in Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Tallinn, Estonia, August 26--30, 2019. ACM, 2019, pp. 807--817. [Online].
[15]
Q. Fu, J.-G. Lou, Y. Wang, and J. Li, "Execution anomaly detection in distributed systems through unstructured log analysis," in International Conference on Data Mining, Miami, Florida, USA, December 6--9, 2009. IEEE Computer Society, 2009, pp. 149--158. [Online].
[16]
S. Messaoudi, A. Panichella, D. Bianculli, L. Briand, and R. Sasnauskas, "A search-based approach for accurate identification of log message formats," in Proceedings of the 26th Conference on Program Comprehension, Gothenburg, Sweden, May 27--28, 2018. ACM, 2018, pp. 167--177. [Online].
[17]
D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," Journal of machine Learning research, vol. 3, no. Jan, pp. 993--1022, 2003.
[18]
R. He, W. S. Lee, H. T. Ng, and D. Dahlmeier, "An unsupervised neural attention model for aspect extraction," in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017, pp. 388--397.
[19]
M. Sundermeyer, R. Schlüter, and H. Ney, "Lstm neural networks for language modeling," in Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9--13, 2012. ISCA, 2012, pp. 194--197. [Online]. Available: http://www.isca-speech.org/archive/interspeech_2012/i12_0194.html
[20]
A. Graves, A.-r. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," in International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, May 26--31, 2013. IEEE, 2013, pp. 6645--6649. [Online].
[21]
M. Xuezhe and H. H. Eduard, "End-to-end sequence labeling via bi-directional lstm-cnns-crf," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, August 7--12, 2016. The Association for Computer Linguistics, 2016. [Online].
[22]
Z. Huang, W. Xu, and K. Yu, "Bidirectional lstm-crf models for sequence tagging," CoRR, vol. abs/1508.01991, 2015. [Online]. Available: http://arxiv.org/abs/1508.01991
[23]
J. P. Chiu and E. Nichols, "Named entity recognition with bidirectional lstm-cnns," Trans. Assoc. Comput. Linguistics, vol. 4, pp. 357--370, 2016. [Online]. Available: https://transacl.org/ojs/index.php/tacl/article/view/792
[24]
M. Shetty, C. Bansal, S. Kumar, N. Rao, N. Nagappan, and T. Zimmermann, "Neural knowledge extraction from cloud service incidents," in 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice. IEEE, 2021, pp. 218--227.
[25]
J. Tabassum, M. Maddela, W. Xu, and A. Ritter, "Code and named entity recognition in stackoverflow," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, July 5--10, 2020. Association for Computational Linguistics, 2020, pp. 4913--4926. [Online].
[26]
R. Caruana, "Multitask learning," Machine learning, vol. 28, no. 1, pp. 41--75, 1997. [Online].
[27]
D. Cotroneo, L. De Simone, P. Liguori, R. Natella, and N. Bidokhti, "How bad can a bug get? an empirical analysis of software failures in the openstack cloud computing platform," in Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Tallinn, Estonia, August 26--30, 2019. ACM, 2019, pp. 200--211. [Online].
[28]
J. Cohen, "A coefficient of agreement for nominal scales," Educational and psychological measurement, vol. 20, pp. 37--46, 1960. [Online]. Available: https://w3.ric.edu/faculty/organic/coge/cohen1960.pdf
[29]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Annual Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, USA, December 5--8, 2013, 2013, pp. 3111--3119. [Online]. Available: https://proceedings.neurips.cc/paper/2013/hash/9aa42b3188ec039965f3c4923ce901b-Abstract.html
[30]
R. Řehůřek and P. Sojka, "Software framework for topic modelling with large corpora," in Proceedings of LREC 2010 workshop New Challenges for NLP Frameworks, Valletta, Malta, May 22, 2010. University of Malta, 2010, pp. 46--50. [Online]. Available: http://www.fi.muni.cz/usr/sojka/presentations/lrec2010-poster-rehurek-sojka.pdf
[31]
S. He, J. Zhu, P. He, and M. R. Lyu, "Loghub: A large collection of system log datasets towards automated log analytics," CoRR, vol. abs/2008.06448, 2020. [Online]. Available: https://arxiv.org/abs/2008.06448
[32]
J. Liu, J. Zhu, S. He, P. He, Z. Zheng, and M. R. Lyu, "Logzip: extracting hidden structures via iterative clustering for log compression," in International Conference on Automated Software Engineering, San Diego, CA, USA, November 11--15, 2019. IEEE, 2019, pp. 863--873. [Online].
[33]
Z. Chen, J. Liu, W. Gu, Y. Su, and M. R. Lyu, "Experience report: Deep learning-based system log analysis for anomaly detection," CoRR, vol. abs/2107.05908, 2021. [Online]. Available: https://arxiv.org/abs/2107.05908
[34]
Q. Lin, H. Zhang, J.-G. Lou, Y. Zhang, and X. Chen, "Log clustering based problem identification for online service systems," in Proceedings of the 38th International Conference on Software Engineering, Austin, TX, USA, May 14--22, 2016 - Companion Volume. ACM, 2016, pp. 102--111. [Online].
[35]
W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, "Detecting large-scale system problems by mining console logs," in Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, 2009, pp. 117--132.
[36]
S. He, J. Zhu, P. He, and M. R. Lyu, "Experience report: System log analysis for anomaly detection," in International Symposium on Software Reliability Engineering, Ottawa, ON, Canada, October 23--27, 2016. IEEE Computer Society, 2016, pp. 207--218. [Online].
[37]
K. Shima, "Length matters: Clustering system log messages using length of words," CoRR, vol. abs/1611.03213, 2016. [Online]. Available: http://arxiv.org/abs/1611.03213
[38]
Z. M. Jiang, A. E. Hassan, P. Flora, and G. Hamann, "Abstracting execution logs to execution events for enterprise applications (short paper)," in Proceedings of the Eighth International Conference on Quality Software, Oxford, UK, August 12--13, 2008,. IEEE Computer Society, 2008, pp. 181--186. [Online].
[39]
A. A. Makanju, A. N. Zincir-Heywood, and E. E. Milios, "Clustering event logs using iterative partitioning," in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28 - July 1, 2009. ACM, 2009, pp. 1255--1264. [Online].
[40]
M. Du, F. Li, G. Zheng, and V. Srikumar, "Deeplog: Anomaly detection and diagnosis from system logs through deep learning," in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, October 30 - November 03, 2017. ACM, 2017, pp. 1285--1298. [Online].
[41]
S. Lu, X. Wei, Y. Li, and L. Wang, "Detecting anomaly in big data system logs using convolutional neural network," in Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress, Athens, Greece, August 12--15, 2018. IEEE Computer Society, 2018, pp. 151--158. [Online].
[42]
S. Nedelkoski, J. Bogatinovski, A. Acker, J. Cardoso, and O. Kao, "Self-attentive classification-based anomaly detection in unstructured logs," in International Conference on Data Mining, Sorrento, Italy, November 17--20, 2020. IEEE, 2020, pp. 1196--1201. [Online].
[43]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, December 4--9, 2017, 2017, pp. 5998--6008. [Online]. Available: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
[44]
P. Covington, J. Adams, and E. Sargin, "Deep neural networks for youtube recommendations," in Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, September 15--19, 2016. ACM, 2016, pp. 191--198. [Online].
[45]
T. Osadchiy, I. Poliakov, P. Olivier, M. Rowland, and E. Foster, "Recommender system based on pairwise association rules," Expert Systems with Applications, vol. 115, pp. 535--542, 2019. [Online].
[46]
S. He, P. He, Z. Chen, T. Yang, Y. Su, and M. R. Lyu, "A survey on automated log analysis for reliability engineering," ACM Computing Surveys (CSUR), vol. 54, no. 6, pp. 1--37, 2021.
[47]
Y. Liang, Y. Zhang, H. Xiong, and R. Sahoo, "Failure prediction in ibm bluegene/levent logs," in International Symposium on Parallel and Distributed Processing, Miami, Florida USA, April 14--18, 2008. IEEE, 2008, pp. 1--5. [Online].
[48]
H. Ott, J. Bogatinovski, A. Acker, S. Nedelkoski, and O. Kao, "Robust and transferable anomaly detection in log data using pre-trained language models," in 2021 IEEE/ACM International Workshop on Cloud Intelligence (CloudIntelligence). IEEE, 2021, pp. 19--24.
[49]
T. Jia, L. Yang, P. Chen, Y. Li, F. Meng, and J. Xu, "Logsed: Anomaly diagnosis through mining time-weighted control flow graph in logs," in International Conference on Cloud Computing (CLOUD), Honolulu, HI, USA, June 25--30, 2017. IEEE Computer Society, 2017, pp. 447--455. [Online].
[50]
T. Jia, P. Chen, L. Yang, Y. Li, F. Meng, and J. Xu, "An approach for anomaly diagnosis based on hybrid graph model with logs for distributed services," in International Conference on Web Services, Honolulu, HI, USA, June 25--30, 2017. IEEE, 2017, pp. 25--32. [Online].
[51]
H. Amar, L. Bao, N. Busany, D. Lo, and S. Maoz, "Using finite-state models for log differencing," in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018, pp. 49--59.
[52]
H. Jiang, X. Li, Z. Yang, and J. Xuan, "What causes my test alarm? automatic cause analysis for test alarms in system and integration testing," in Proceedings of the 39th International Conference on Software Engineering, Buenos Aires, Argentina, May 20--28, 2017. IEEE / ACM, 2017, pp. 712--723. [Online].

Cited By

View all
  • (2024)Unlocking the Power of Numbers: Log Compression via Numeric Token ParsingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695474(919-930)Online publication date: 27-Oct-2024
  • (2024)LILAC: Log Parsing using LLMs with Adaptive Parsing CacheProceedings of the ACM on Software Engineering10.1145/36437331:FSE(137-160)Online publication date: 12-Jul-2024
  • (2023)ForestZip: An Effective Parallel Parser for Log CompressionProceedings of the 2023 3rd Guangdong-Hong Kong-Macao Greater Bay Area Artificial Intelligence and Big Data Forum10.1145/3660395.3660443(274-278)Online publication date: 22-Sep-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '23: Proceedings of the 45th International Conference on Software Engineering
May 2023
2713 pages
ISBN:9781665457019
  • General Chair:
  • John Grundy,
  • Program Co-chairs:
  • Lori Pollock,
  • Massimiliano Di Penta

Sponsors

In-Cooperation

  • IEEE CS

Publisher

IEEE Press

Publication History

Published: 26 July 2023

Check for updates

Qualifiers

  • Research-article

Conference

ICSE '23
Sponsor:
ICSE '23: 45th International Conference on Software Engineering
May 14 - 20, 2023
Victoria, Melbourne, Australia

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)49
  • Downloads (Last 6 weeks)6
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Unlocking the Power of Numbers: Log Compression via Numeric Token ParsingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695474(919-930)Online publication date: 27-Oct-2024
  • (2024)LILAC: Log Parsing using LLMs with Adaptive Parsing CacheProceedings of the ACM on Software Engineering10.1145/36437331:FSE(137-160)Online publication date: 12-Jul-2024
  • (2023)ForestZip: An Effective Parallel Parser for Log CompressionProceedings of the 2023 3rd Guangdong-Hong Kong-Macao Greater Bay Area Artificial Intelligence and Big Data Forum10.1145/3660395.3660443(274-278)Online publication date: 22-Sep-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media