DSBench

This repo provides the source code of our paper: DSBench: How Far are Data Science Agents from Becoming Data Science Experts? [PDF][Twitter] If you discuss or use DSBench in your research, please cite us!

@misc{jing2024dsbenchfardatascience,
      title={DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?}, 
      author={Liqiang Jing and Zhehui Huang and Xiaoyang Wang and Wenlin Yao and Wenhao Yu and Kaixin Ma and Hongming Zhang and Xinya Du and Dong Yu},
      year={2024},
      eprint={2409.07703},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2409.07703}, 
}

Overview

DSBench is a benchmark for evaluating data science agents with realistic data analysis and data modeling tasks collected from modeloff and kaggle. Given a task instruction (may contain image and table) and data files, a data science agent is tasked with generating a solution that resolves the described task.

Set Up

For evaluation, you should install the Python packages in the requirments.txt file.

Usage

Clone this repo.
Install all the requirments in Set Up.
For evaluation on data analysis task, refer to ./data_analysis/readme.md.
For evaluation on data modeling task, refer to ./data_modeling/readme.md.

Results

Disclaimer

The dataset provided is intended solely for educational and research purposes, with the goal of fostering research in related areas. Users of this dataset are required to adhere to the following guidelines:

Data Source and Accuracy: While efforts have been made to curate and organize the data, we make no guarantees regarding the accuracy, completeness, or timeliness of the dataset. Users are encouraged to independently verify the data's accuracy and assume full responsibility for any conclusions drawn from it.
Usage Restrictions: This dataset is strictly for non-commercial use. Any commercial development or profit-driven activity requires explicit written permission from the dataset providers.
Privacy and Compliance: Users must ensure that their use of the dataset complies with all applicable laws and regulations, particularly those related to privacy and data security. The dataset providers are not responsible for any legal consequences arising from improper use of the data.
Non-Infringement: The pre-processed data provided by us is intended solely for educational and research purposes. We do not claim ownership of the original data, and any use of this data should respect the rights of the original creators. Users are responsible for ensuring that their use of the data does not infringe on any copyrights or other intellectual property rights.
Disclaimer of Liability: The dataset providers shall not be held liable for any direct or indirect consequences resulting from the use of this dataset, including but not limited to losses, damages, or liabilities arising from reliance on the information contained within the dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
data_analysis		data_analysis
data_modeling		data_modeling
figures		figures
LICENSE		LICENSE
README.md		README.md
requirments.txt		requirments.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DSBench

Overview

Set Up

Usage

Results

Disclaimer

About

Releases

Packages

Languages

License

LiqiangJing/DSBench

Folders and files

Latest commit

History

Repository files navigation

DSBench

Overview

Set Up

Usage

Results

Disclaimer

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages