This repository contains the official implementation of the paper "Norm-Regularized Token Compression in Vision Transformer Networks". The paper proposes a novel method for token compression in Vision Transformer networks using norm regularization to enhance model efficiency and performance.
This project requires the following libraries:
- PyTorch
- torchvision
- timm
- numpy
- tqdm
- thop (for calculating the model complexity)
Ensure you have Python 3.x installed along with the above libraries.
Clone this repository to your local machine to get started:
git clone https://github.com/yourgithubusername/ViT-NormReg-Compressor.git
cd ViT-NormReg-Compressor
To use the model, modify the inputs in main.py. You can change the model argument to apply different token compression techniques. Adjust the batch_size argument in the same file to set the desired batch size. To switch datasets (e.g., STL10, CIFAR10), modify the data_name argument. You can adjust the pruning level by changing the reduce_token variable.
The project is structured as follows:
- main.py: Main script where models are configured and training is initiated.
- pruning/patch/timm: This directory contains implementations of the pruning methods we have applied to Vision Transformer models using the TIMM library.
- data/: Dataset handling scripts.
It was found that applying norm regularization using the Top K method does not reduce accuracy compared to existing methods.
GitHub Username: maikimilk
This project is based in part on the code and concepts from the following research:
- Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Feichtenhofer, and Judy Hoffman. "Token Merging: Your ViT but Faster." In International Conference on Learning Representations, 2023.
This project also makes use of third-party data:
- "ToMe" by facebookresearch, available under a Creative Commons Attribution-NonCommercial CC-BY-NC 4.0. View Source
If you find this project useful in your research, please consider citing:
@inproceedings{masayuki2024norm-pruning,
title={Norm-Regularized Token Compression in Vision Transformer Networks},
author={Masayuki Ishikawa, Ryuto Ishibashi and Lin Meng},
year={2024}
}
@inproceedings{bolya2022tome,
title={Token Merging: Your {ViT} but Faster},
author={Bolya, Daniel and Fu, Cheng-Yang and Dai, Xiaoliang and Zhang, Peizhao and Feichtenhofer, Christoph and Hoffman, Judy},
booktitle={International Conference on Learning Representations},
year={2023}
}
This project is licensed under a Creative Commons Attribution-NonCommercial - see the CC-BY-NC 4.0 file for details.
For questions and feedback, please reach out to ri0146fe@ed.ritsumei.ac.jp