Evaluating large language models trained on code

M Chen, J Tworek, H Jun, Q Yuan, HPDO Pinto… - arXiv preprint arXiv …, 2021 - arxiv.org
We introduce Codex, a GPT language model fine-tuned on publicly available code from
GitHub, and study its Python code-writing capabilities. A distinct production version of Codex …

Measuring coding challenge competence with apps

D Hendrycks, S Basart, S Kadavath, M Mazeika… - arXiv preprint arXiv …, 2021 - arxiv.org
While programming is one of the most broadly applicable skills in modern society, modern
machine learning models still cannot code solutions to basic problems. Despite its …

Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation

J Liu, CS Xia, Y Wang, L Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc
Program synthesis has been long studied with recent approaches focused on directly using
the power of Large Language Models (LLMs) to generate code. Programming benchmarks …

Codexglue: A machine learning benchmark dataset for code understanding and generation

S Lu, D Guo, S Ren, J Huang, A Svyatkovskiy… - arXiv preprint arXiv …, 2021 - arxiv.org
Benchmark datasets have a significant impact on accelerating research in programming
language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster …

Program synthesis with large language models

J Austin, A Odena, M Nye, M Bosma… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper explores the limits of the current generation of large language models for
program synthesis in general purpose programming languages. We evaluate a collection of …

[PDF][PDF] Unifying the perspectives of nlp and software engineering: A survey on language models for code

Z Zhang, C Chen, B Liu, C Liao, Z Gong… - arXiv preprint arXiv …, 2023 - simg.baai.ac.cn
In this work we systematically review the recent advancements in code processing with
language models, covering 50+ models, 30+ evaluation tasks, 170+ datasets, and 700 …

Codegen: An open large language model for code with multi-turn program synthesis

E Nijkamp, B Pang, H Hayashi, L Tu, H Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Program synthesis strives to generate a computer program as a solution to a given problem
specification, expressed with input-output examples or natural language descriptions. The …

Synchromesh: Reliable code generation from pre-trained language models

G Poesia, O Polozov, V Le, A Tiwari, G Soares… - arXiv preprint arXiv …, 2022 - arxiv.org
Large pre-trained language models have been used to generate code, providing a flexible
interface for synthesizing programs from natural language specifications. However, they …

Pangu-coder: Program synthesis with function-level language modeling

F Christopoulou, G Lampouras, M Gritta… - arXiv preprint arXiv …, 2022 - arxiv.org
We present PanGu-Coder, a pretrained decoder-only language model adopting the PanGu-
Alpha architecture for text-to-code generation, ie the synthesis of programming language …

Octopack: Instruction tuning code large language models

N Muennighoff, Q Liu, A Zebaze, Q Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org
Finetuning large language models (LLMs) on instructions leads to vast performance
improvements on natural language tasks. We apply instruction tuning using code …