Автори
Alberto Bacchelli, Tommaso Dal Sasso, Marco D'Ambros, Michele Lanza
Дата на публикуване
2012/6/2
Конференция
[ICSE 2012] 34th ACM/IEEE International Conference on Software Engineering
Страници
375-385
Издател
IEEE
Описание
Emails related to the development of a software system contain information about design choices and issues encountered during the development process. Exploiting the knowledge embedded in emails with automatic tools is challenging, due to the unstructured, noisy, and mixed language nature of this communication medium. Natural language text is often not well-formed and is interleaved with languages with other syntaxes, such as code or stack traces. We present an approach to classify email content at line level. Our technique classifies email lines in five categories (i.e., text, junk, code, patch, and stack trace) to allow one to subsequently apply ad hoc analysis techniques for each category. We evaluated our approach on a statistically significant set of emails gathered from mailing lists of four unrelated open source systems.
Общ брой позовавания
20122013201420152016201720182019202020212022202320245111414111417111212762
Статии в Google Наука
A Bacchelli, T Dal Sasso, M D'Ambros, M Lanza - 2012 34th International Conference on Software …, 2012