Multimodal Relations Auxiliary Network (MRAN)

In many news programs, sign language is provided for auditory-impaired people, which is relevant to video contents closely. Therefore, we propose a new video captioning task termed Sign Language Assisted Video Captioning (SLAVC) task. We introduce Multimodal Relations Auxiliary Network (MRAN) to handle SLAVC task:

MRAN Architecture

MARN can model the relations between different modalities to help generate high-quality sentences.

In addition, we propose China Focus On (CFO) dataset, which containing three modalities (ie. visual, sign language and audio), to explore SLAVC task. The preprocessed features and the json file (include URL, start time and end time, category, captions of each videos) are available here.

Environment

conda create -n MRAN python=3.6
conda activate MRAN
pip install torch torchvision torchaudio
pip install -r requirements.txt

Preprocess

Download the preprocessed features and corpus and place them to data/CFO.

If you want to use your own datasets, preprocess them as the following steps:

Extract appearance feature

python preprocess/extract_feat.py --dataset CFO --feature_type appearance --image_height 224 --image_width 224 --gpu_id 0

Extract motion feature
Download ResNeXt-101 pretrained model (resnext-101-kinetics.pth) and place it to data/preprocess/pretrained.

python preprocess/extract_feat.py --dataset CFO --feature_type motion --image_height 112 --image_width 112 --gpu_id 0

Extract sign language feature

python preprocess/extract_feat.py --dataset CFO --feature_type hand --image_height 112 --image_width 112 --gpu_id 0

Build corpus

python preprocess/build_vocab.py --dataset CFO -H 2

Training

Our pretrained model is available here. Download and save it for evaluation. Or you can train a new model:

python train.py --cfg configs/CFO.yml

Evaluation

python evaluate.py --model_path {model_path} --save_path {save_path}

Acknowledgement

Some codes refer to HCRN

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
coco-caption		coco-caption
configs		configs
images		images
misc		misc
models		models
preprocess		preprocess
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
config.py		config.py
dataloader.py		dataloader.py
evaluate.py		evaluate.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Relations Auxiliary Network (MRAN)

Environment

Preprocess

Training

Evaluation

Acknowledgement

About

Releases

Packages

Languages

License

achyun/MRAN

Folders and files

Latest commit

History

Repository files navigation

Multimodal Relations Auxiliary Network (MRAN)

Environment

Preprocess

Training

Evaluation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages