Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code
Large Language Models (LLMs) demonstrate strong proficiency in generating code for high-resource
programming languages (HRPLs) like Python but struggle significantly with low-…
programming languages (HRPLs) like Python but struggle significantly with low-…
Sampling Language from Latent System 2 Reasoning
… We infer the set of demonstrations D that maximizes our ELBO. These demonstrations should
… Once we find the D that maximizes our ELBO, we sample code programs from the resulting …
… Once we find the D that maximizes our ELBO, we sample code programs from the resulting …
Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
The Transformer architecture has significantly advanced deep learning, particularly in natural
language processing, by effectively managing long-range dependencies. However, as the …
language processing, by effectively managing long-range dependencies. However, as the …
Qwen2 technical report
This report introduces the Qwen2 series, the latest addition to our large language models
and large multimodal models. We release a comprehensive suite of foundational and …
and large multimodal models. We release a comprehensive suite of foundational and …
MoDEM: Mixture of Domain Expert Models
T Simonds, K Kurniawan, JH Lau - arXiv preprint arXiv:2410.07490, 2024 - arxiv.org
We propose a novel approach to enhancing the performance and efficiency of large language
models (LLMs) by combining domain prompt routing with domain-specialized models. We …
models (LLMs) by combining domain prompt routing with domain-specialized models. We …
Selective Prompt Anchoring for Code Generation
Y Tian, T Zhang - arXiv preprint arXiv:2408.09121, 2024 - arxiv.org
Recent advances in large language models (LLMs) such as Copilot and ChatGPT have
transformed software development by automating coding tasks. Despite these advancements, …
transformed software development by automating coding tasks. Despite these advancements, …
CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions
J Rao, X Liu, L Lian, S Cheng, Y Liao… - arXiv preprint arXiv …, 2024 - arxiv.org
With instruction tuning, Large Language Models (LLMs) can enhance their ability to adhere
to commands. Diverging from most works focusing on data mixing, our study concentrates on …
to commands. Diverging from most works focusing on data mixing, our study concentrates on …
A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks
… Parish, Emy Parparita, Alex Passos, Mikhail Pavlov, Andrew Peng, Adam Perelman, Filipe
de Avila Belbute Peres, Michael Petrov, Henrique Ponde de Oliveira Pinto, Michael, Pokorny, …
de Avila Belbute Peres, Michael Petrov, Henrique Ponde de Oliveira Pinto, Michael, Pokorny, …
P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
… path than the human-annotated one, we use the pass@k (Chen et al.… “P No” is premise
number and “D” stands for derivation. … We conduct detailed analysis to show where most poweful …
number and “D” stands for derivation. … We conduct detailed analysis to show where most poweful …
Data-juicer: A one-stop data processing system for large language models
… The statistical information can be generated and consumed by Data-Juicer’s other OPs
and tools, and we will describe more details of them in later sections. This interface works at …
and tools, and we will describe more details of them in later sections. This interface works at …