CodeInsight: A Curated Dataset of Practical Coding Solutions from Stack Overflow

N Beau, B Crabbé - Findings of the Association for Computational …, 2024 - aclanthology.org
We introduce a novel dataset tailored for code generation, aimed at aiding developers in
common tasks. Our dataset provides examples that include a clarified intent, code snippets …

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers

Q Chen, W Wang, Q Zhang, S Zheng, S Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
The Transformer architecture has significantly advanced deep learning, particularly in natural
language processing, by effectively managing long-range dependencies. However, as the …

Qwen2 technical report

…, J Bai, J He, J Lin, K Dang, K Lu, K Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
This report introduces the Qwen2 series, the latest addition to our large language models
and large multimodal models. We release a comprehensive suite of foundational and …

A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks

S Hudson, S Jit, BC Hu, M Chechik - arXiv preprint arXiv:2406.08216, 2024 - arxiv.org
… Parish, Emy Parparita, Alex Passos, Mikhail Pavlov, Andrew Peng, Adam Perelman, Filipe
de Avila Belbute Peres, Michael Petrov, Henrique Ponde de Oliveira Pinto, Michael, Pokorny, …

Yi: Open foundation models by 01. ai

A Young, B Chen, C Li, C Huang, G Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce the Yi model family, a series of language and multimodal models that
demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B …

Data-juicer: A one-stop data processing system for large language models

D Chen, Y Huang, Z Ma, H Chen, X Pan, C Ge… - Companion of the 2024 …, 2024 - dl.acm.org
… The statistical information can be generated and consumed by Data-Juicer’s other OPs
and tools, and we will describe more details of them in later sections. This interface works at …

Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement

L Yu, B Yu, H Yu, F Huang, Y Li - arXiv preprint arXiv:2408.03092, 2024 - arxiv.org
… The Medium tier includes weights from the 1/3 mark to the 2/3 mark, and the High tier contains
weights from the 2/3 mark to the end. Table 6 quantitatively illustrates the adjustments of …

Symbolic Learning Enables Self-Evolving Agents

…, J Chen, S Wang, X Xu, N Zhang, H Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
The AI community has been exploring a pathway to artificial general intelligence (AGI) by
developing "language agents", which are complex large language models (LLMs) pipelines …

Towards Effective and Efficient Continual Pre-training of Large Language Models

J Chen, Z Chen, J Wang, K Zhou, Y Zhu, J Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
… Note that we include commonly used math and code benchmarks in this group because
it is … We also report all benchmark results in Appendix D. Next, we introduce the detailed …

Satlm: Satisfiability-aided language models using declarative prompting

X Ye, Q Chen, I Dillig, G Durrett - Advances in Neural …, 2024 - proceedings.neurips.cc
Prior work has combined chain-of-thought prompting in large language models (LLMs) with
programmatic representations to perform effective and transparent reasoning. While such an …