An empirical study of code smells in transformer-based code generation techniques

ML Siddiq, SH Majumder, MR Mim… - 2022 IEEE 22nd …, 2022 - ieeexplore.ieee.org
2022 IEEE 22nd International Working Conference on Source Code …, 2022ieeexplore.ieee.org
Prior works have developed transformer-based language learning models to automatically
generate source code for a task without compilation errors. The datasets used to train these
techniques include samples from open source projects which may not be free of security
flaws, code smells, and violations of standard coding practices. Therefore, we investigate to
what extent code smells are present in the datasets of coding generation techniques and
verify whether they leak into the output of these techniques. To conduct this study, we used …
Prior works have developed transformer-based language learning models to automatically generate source code for a task without compilation errors. The datasets used to train these techniques include samples from open source projects which may not be free of security flaws, code smells, and violations of standard coding practices. Therefore, we investigate to what extent code smells are present in the datasets of coding generation techniques and verify whether they leak into the output of these techniques. To conduct this study, we used Pylint and Bandit to detect code smells and security smells in three widely used training sets (CodeXGlue, APPS, and Code Clippy). We observed that Pylint caught 264 code smell types, whereas Bandit located 44 security smell types in these three datasets used for training code generation techniques. By analyzing the output from ten different configurations of the open-source fine-tuned transformer-based GPT-Neo 125M parameters model, we observed that this model leaked the smells and non-standard practices to the generated source code. When analyzing GitHub Copilot's suggestions, a closed source code generation tool, we observed that it contained 18 types of code smells, including substandard coding patterns and 2 security smell types.
ieeexplore.ieee.org
Показан е най-добрият резултат за това търсене. Показване на всички резултати