Google Наука

A statistical semantic language model for source code

TT Nguyen, AT Nguyen, HA Nguyen… - Proceedings of the 2013 …, 2013 - dl.acm.org

TT Nguyen, AT Nguyen, HA Nguyen, TN Nguyen

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, 2013•dl.acm.org

Recent research has successfully applied the statistical n-gram language model to show that source code exhibits a good level of repetition. The n-gram model is shown to have good predictability in supporting code suggestion and completion. However, the state-of-the-art n-gram approach to capture source code regularities/patterns is based only on the lexical information in a local context of the code units. To improve predictability, we introduce SLAMC, a novel statistical semantic language model for source code. It incorporates semantic information into code tokens and models the regularities/patterns of such semantic annotations, called sememes, rather than their lexemes. It combines the local context in semantic n-grams with the global technical concerns/functionality into an n-gram topic model, together with pairwise associations of program elements. Based on SLAMC, we developed a new code suggestion method, which is empirically evaluated on several projects to have relatively 18-68% higher accuracy than the state-of-the-art approach.

ACM Digital Library

Показване на ощеПоказване на по-малко

Запазване Позоваване С позовавания в 296 Сродни статии Всички 2 версии

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

A statistical semantic language model for source code