Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.
-
Updated
May 10, 2022 - Python
Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.
The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine
[AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021 (Oral)
Code for journal paper "Learning Dual Semantic Relations with Graph Attention for Image-Text Matching", TCSVT, 2020.
Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
A non-JIT version implementation / replication of CLIP of OpenAI in pytorch
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
[TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”
Code implementation of paper "SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval" (ACM TOMM 2024).
Implementation of the "Learn No to Say Yes Better" paper.
Easy wrapper for inserting LoRA layers in CLIP.
Text Query based Traffic Video Event Retrieval with Global-Local Fusion Embedding
A dead-simple image search and image-text matching system for Bangla using CLIP
[ICML 2024] Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark containing 100K image-text pairs for robust image-text matching/retrieval models.
CLIP (Contrastive Language–Image Pre-training) for Bangla.
Unofficial code of paper "Improving description-based person re-identification by multi-granularity image-text alignment." by Niu et al. (partially implemented)
A list of research papers on knowledge-enhanced multimodal learning
Add a description, image, and links to the image-text-matching topic page so that developers can more easily learn about it.
To associate your repository with the image-text-matching topic, visit your repo's landing page and select "manage topics."