Skip to content

Python Implementation for ATAIT 2024 Paper: "Norm-Regularized Token Compression in Vision Transformer Networks"

License

Notifications You must be signed in to change notification settings

maikimilk/ViT-NormReg-Compressor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Norm-Regularized Token Compression in Vision Transformer Networks

AI Pruning Python

Overview

This repository contains the official implementation of the paper "Norm-Regularized Token Compression in Vision Transformer Networks". The paper proposes a novel method for token compression in Vision Transformer networks using norm regularization to enhance model efficiency and performance.

Prerequisites

This project requires the following libraries:

  • PyTorch
  • torchvision
  • timm
  • numpy
  • tqdm
  • thop (for calculating the model complexity)

Ensure you have Python 3.x installed along with the above libraries.

Installation

Clone this repository to your local machine to get started:

git clone https://github.com/yourgithubusername/ViT-NormReg-Compressor.git
cd ViT-NormReg-Compressor

Usage

To use the model, modify the inputs in main.py. You can change the model argument to apply different token compression techniques. Adjust the batch_size argument in the same file to set the desired batch size. To switch datasets (e.g., STL10, CIFAR10), modify the data_name argument. You can adjust the pruning level by changing the reduce_token variable.

Code Structure

The project is structured as follows:

  • main.py: Main script where models are configured and training is initiated.
  • pruning/patch/timm: This directory contains implementations of the pruning methods we have applied to Vision Transformer models using the TIMM library.
  • data/: Dataset handling scripts.

Experimental Results

It was found that applying norm regularization using the Top K method does not reduce accuracy compared to existing methods.

Authors and Contributors

GitHub Username: maikimilk

Acknowledgments

This project is based in part on the code and concepts from the following research:

  • Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Feichtenhofer, and Judy Hoffman. "Token Merging: Your ViT but Faster." In International Conference on Learning Representations, 2023.

This project also makes use of third-party data:

  • "ToMe" by facebookresearch, available under a Creative Commons Attribution-NonCommercial CC-BY-NC 4.0. View Source

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{masayuki2024norm-pruning,
  title={Norm-Regularized Token Compression in Vision Transformer Networks},
  author={Masayuki Ishikawa, Ryuto Ishibashi and Lin Meng},
  year={2024}
}

@inproceedings{bolya2022tome,
  title={Token Merging: Your {ViT} but Faster},
  author={Bolya, Daniel and Fu, Cheng-Yang and Dai, Xiaoliang and Zhang, Peizhao and Feichtenhofer, Christoph and Hoffman, Judy},
  booktitle={International Conference on Learning Representations},
  year={2023}
}

License

This project is licensed under a Creative Commons Attribution-NonCommercial - see the CC-BY-NC 4.0 file for details.

Contact

For questions and feedback, please reach out to ri0146fe@ed.ritsumei.ac.jp

About

Python Implementation for ATAIT 2024 Paper: "Norm-Regularized Token Compression in Vision Transformer Networks"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages