Skip to content
/ MRAN Public

Multimodal Relations Auxiliary Network for Sign language Assisted Video Captioning

License

Notifications You must be signed in to change notification settings

achyun/MRAN

Repository files navigation

Multimodal Relations Auxiliary Network (MRAN)

In many news programs, sign language is provided for auditory-impaired people, which is relevant to video contents closely. Therefore, we propose a new video captioning task termed Sign Language Assisted Video Captioning (SLAVC) task. We introduce Multimodal Relations Auxiliary Network (MRAN) to handle SLAVC task:

MRAN Architecture

MARN can model the relations between different modalities to help generate high-quality sentences.

In addition, we propose China Focus On (CFO) dataset, which containing three modalities (ie. visual, sign language and audio), to explore SLAVC task. The preprocessed features and the json file (include URL, start time and end time, category, captions of each videos) are available here.

Environment

conda create -n MRAN python=3.6
conda activate MRAN
pip install torch torchvision torchaudio
pip install -r requirements.txt

Preprocess

Download the preprocessed features and corpus and place them to data/CFO.

If you want to use your own datasets, preprocess them as the following steps:

  • Extract appearance feature
python preprocess/extract_feat.py --dataset CFO --feature_type appearance --image_height 224 --image_width 224 --gpu_id 0
  • Extract motion feature
    Download ResNeXt-101 pretrained model (resnext-101-kinetics.pth) and place it to data/preprocess/pretrained.
python preprocess/extract_feat.py --dataset CFO --feature_type motion --image_height 112 --image_width 112 --gpu_id 0
  • Extract sign language feature
python preprocess/extract_feat.py --dataset CFO --feature_type hand --image_height 112 --image_width 112 --gpu_id 0
  • Build corpus
python preprocess/build_vocab.py --dataset CFO -H 2

Training

Our pretrained model is available here. Download and save it for evaluation. Or you can train a new model:

python train.py --cfg configs/CFO.yml

Evaluation

python evaluate.py --model_path {model_path} --save_path {save_path}

Acknowledgement

  • Some codes refer to HCRN

About

Multimodal Relations Auxiliary Network for Sign language Assisted Video Captioning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published