Data-juicer: A one-stop data processing system for large language models

D Chen, Y Huang, Z Ma, H Chen, X Pan, C Ge… - Companion of the 2024 …, 2024 - dl.acm.org
The immense evolution in Large Language Models (LLMs) has underscored the importance
of massive, heterogeneous, and high-quality data. A data recipe is a mixture of data from …

Data-Juicer: A One-Stop Data Processing System for Large Language Models

D Chen, Y Huang, Z Ma, H Chen, X Pan, C Ge… - arXiv e …, 2023 - ui.adsabs.harvard.edu
The immense evolution in Large Language Models (LLMs) has underscored the importance
of massive, heterogeneous, and high-quality data. A data recipe is a mixture of data from …

Data-Juicer: A One-Stop Data Processing System for Large Language Models

D Chen, Y Huang, Z Ma, H Chen, X Pan, C Ge… - Companion of the 2024 …, 2024 - dl.acm.org
The immense evolution in Large Language Models (LLMs) has underscored the importance
of massive, heterogeneous, and high-quality data. A data recipe is a mixture of data from …

[PDF][PDF] Data-Juicer: A One-Stop Data Processing System for Large Language Models

D Chen, Y Huang, Z Ma, H Chen, X Pan, C Ge, D Gao… - bolinding.github.io
The immense evolution in Large Language Models (LLMs) has underscored the importance
of massive, heterogeneous, and highquality data. A data recipe is a mixture of data of …