Skip to content

ktrk115/lsmdc

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Joint Sequence Fusion Model for Video Question Answering and Retrieval

This project hosts the tensorflow implementation for our ECCV 2018 paper, A Joint Sequence Fusion Model for Video Question Answering and Retrieval}.

Reference

If you use this code or dataset as part of any published research, please refer the following paper.

@inproceedings{
  author    = {Youngjae Yu and Jongseok Kim and Gunhee Kim},
  title     = "{A Joint Sequence Fusion Model for Video Question Answering and Retrieval}"
  booktitle = {ECCV},
  year      = 2018
}

Setup

Install dependencies

pip install -r requirements.txt

Setup python paths

git submodule update --init --recursive
add2virtualenv .

Prepare Data

  • Video Feature

    1. Download LSMDC data.

    2. Extract rgb features using pool5 layer of the pretrained ResNet-152 model.

    3. Extract audio features using VGGish.

    4. Concat rgb and video features and save it into hdf5 file, and save it in 'dataset/LSMDC/LSMDC16_features/RESNET_pool5wav.hdf5'.

  • Dataset

    • We processed raw data frames file in LSMDC17 and MSR-VTT dataset
    • Download dataframe files
    • Save these files in "dataset/LSMDC/DataFrame"
  • Vocabulary

Training

Modify configuartion.py to suit your environment.

  • train_tag can be 'MC', 'FIB'

Run train.py.

python train.py --tag="tag"

Pretrained Model

You can download the models and features in gDrive Link Modify 'configuration.py' to load the checkpoints (self.load_from_ckpt = 'path/to/checkpoint/')

[RET] R@1: 93, R@5: 247, R@10: 348, medr : 29
[FIB] Accuracy: 45.1

You can get slightly lower or higher performance from these scores.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%