DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding

Ting Liu, Xuyang Liu, Siteng Huang, Honggang Chen, Quanjun Yin, Long Qin, Donglin Wang, Yue Hu

✨ Overview

In this paper, we explore applying parameter-efficient transfer learning (PETL) to efficiently transfer the pre-trained vision-language knowledge to VG. Specifically, we propose DARA, a novel PETL method comprising Domain-aware Adapters (DA Adapters) and Relation-aware Adapters (RA Adapters) for VG. DA Adapters first transfer intra-modality representations to be more fine-grained for the VG domain. Then RA Adapters share weights to bridge the relation between two modalities, improving spatial reasoning. Empirical results on widely-used benchmarks demonstrate that DARA achieves the best accuracy while saving numerous updated parameters compared to the full fine-tuning and other PETL methods. Notably, with only 2.13% tunable backbone parameters, DARA improves average accuracy by 0.81% across the three benchmarks compared to the baseline model. Note that the tunale parameters are lower than reported in the paper by optimization.

👉 Installation

Clone this repository.

git clone https://github.com/liuting20/DARA.git

Prepare for the running environment.

 conda env create -f environment.yaml      pip install -r requirements.txt

👉 Getting Started

Please refer to GETTING_STARGTED.md to learn how to prepare the datasets and pretrained checkpoints.

👉 Training and Evaluation

Training

CUDA_VISIBLE_DEVICES=0 python -u train.py --batch_size 64 --lr_bert 0.00001 --aug_crop --aug_scale --aug_translate --backbone resnet50 --detr_model ./checkpoints/detr-r50-referit.pth --bert_enc_num 12 --detr_enc_num 6 --dataset unc --max_query_len 20 --output_dir outputs/referit_r50 --epochs 90 --lr_drop 60

We recommend to set --max_query_len 40 for RefCOCOg, and --max_query_len 20 for other datasets.

We recommend to set --epochs 180 (--lr_drop 120 acoordingly) for RefCOCO+, and --epochs 90 (--lr_drop 60 acoordingly) for other datasets.

Evaluation

CUDA_VISIBLE_DEVICES=0 python -u eval.py --batch_size 64 --num_workers 4 --bert_enc_num 12 --detr_enc_num 6 --backbone resnet50 --dataset unc --max_query_len 20 --eval_set testA --eval_model ./outputs/referit_r50/best_checkpoint.pth --output_dir ./outputs/referit_r50

👍 Acknowledge

This codebase is partially based on TransVG.

📌 Citation

Please consider citing our paper in your publications, if our findings help your research.

@article{liu2024dara,
  title={DARA: Domain-and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding},
  author={Liu, Ting and Liu, Xuyang and Huang, Siteng and Chen, Honggang and Yin, Quanjun and Qin, Long and Wang, Donglin and Hu, Yue},
  journal={arXiv preprint arXiv:2405.06217},
  year={2024}
}

📧 Contact

For any question about our paper or code, please contact Ting Liu or Xuyang Liu.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
configs		configs
datasets		datasets
models		models
utils		utils
GETTING_STARTED.md		GETTING_STARTED.md
README.md		README.md
engine.py		engine.py
environment.yaml		environment.yaml
eval.py		eval.py
overview.png		overview.png
requirements.txt		requirements.txt
test.sh		test.sh
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding

✨ Overview

👉 Installation

👉 Getting Started

👉 Training and Evaluation

👍 Acknowledge

📌 Citation

📧 Contact

About

Releases

Packages

Contributors 2

Languages

liuting20/DARA

Folders and files

Latest commit

History

Repository files navigation

DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding

✨ Overview

👉 Installation

👉 Getting Started

👉 Training and Evaluation

👍 Acknowledge

📌 Citation

📧 Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages