Google Наука

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

Запазване Позоваване С позовавания в 353 Сродни статии Всички 11 версии

[PDF] arxiv.org

Spot-the-difference self-supervised pre-training for anomaly detection and segmentation

Y Zou, J Jeong, L Pemula, D Zhang… - European Conference on …, 2022 - Springer

Visual anomaly detection is commonly used in industrial quality inspection. In this paper, we
present a new dataset as well as a new self-supervised learning method for ImageNet pre …

Запазване Позоваване С позовавания в 204 Сродни статии Всички 8 версии

[PDF] arxiv.org

Llava-onevision: Easy visual task transfer

B Li, Y Zhang, D Guo, R Zhang, F Li, H Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed
by consolidating our insights into data, models, and visual representations in the LLaVA …

Запазване Позоваване С позовавания в 53 Сродни статии Всички 2 версии Във вид на HTML

[PDF] thecvf.com

Image retrieval on real-life images with pre-trained vision-and-language models

Z Liu, C Rodriguez-Opazo… - Proceedings of the …, 2021 - openaccess.thecvf.com

We extend the task of composed image retrieval, where an input query consists of an image
and short textual description of how to modify the image. Existing methods have only been …

Запазване Позоваване С позовавания в 166 Сродни статии Всички 7 версии Във вид на HTML

[PDF] openreview.net

Fine-tuning multimodal llms to follow zero-shot demonstrative instructions

J Li, K Pan, Z Ge, M Gao, W Ji, W Zhang… - The Twelfth …, 2023 - openreview.net

Recent advancements in Multimodal Large Language Models (MLLMs) have been utilizing
Visual Prompt Generators (VPGs) to convert visual features into tokens that LLMs can …

Запазване Позоваване С позовавания в 51 Сродни статии Всички 2 версии Във вид на HTML

Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey

D Sharma, C Dhiman, D Kumar - Expert Systems with Applications, 2023 - Elsevier

Abstract Automatic Visual Captioning (AVC) generates syntactically and semantically correct
sentences by describing important objects, attributes, and their relationships with each other …

Запазване Позоваване С позовавания в 12 Сродни статии Всички 2 версии

[PDF] arxiv.org

Visit-bench: A benchmark for vision-language instruction following inspired by real-world use

Y Bitton, H Bansal, J Hessel, R Shao, W Zhu… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce VisIT-Bench (Visual InsTruction Benchmark), a benchmark for evaluation of
instruction-following vision-language models for real-world use. Our starting point is curating …

Запазване Позоваване С позовавания в 42 Сродни статии Всички 4 версии Във вид на HTML

Remote sensing image change captioning with dual-branch transformers: A new method and a large scale dataset

C Liu, R Zhao, H Chen, Z Zou… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Analyzing land cover changes with multitemporal remote sensing (RS) images is crucial for
environmental protection and land planning. In this article, we explore RS image change …

Запазване Позоваване С позовавания в 68 Сродни статии Всички 2 версии

[PDF] aaai.org

Visual instruction tuning with polite flamingo

D Chen, J Liu, W Dai, B Wang - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Recent research has demonstrated that the multi-task fine-tuning of multi-modal Large
Language Models (LLMs) using an assortment of annotated downstream vision-language …

Запазване Позоваване С позовавания в 35 Сродни статии Всички 3 версии Във вид на HTML

[PDF] thecvf.com

Fashion iq: A new dataset towards retrieving images by natural language feedback

H Wu, Y Gao, X Guo, Z Al-Halah… - Proceedings of the …, 2021 - openaccess.thecvf.com

Conversational interfaces for the detail-oriented retail fashion domain are more natural,
expressive, and user friendly than classical keyword-based search interfaces. In this paper …

Запазване Позоваване С позовавания в 235 Сродни статии Всички 7 версии Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Learning to describe differences between pairs of similar images

From show to tell: A survey on deep learning-based image captioning

Spot-the-difference self-supervised pre-training for anomaly detection and segmentation

Llava-onevision: Easy visual task transfer

Image retrieval on real-life images with pre-trained vision-and-language models

Fine-tuning multimodal llms to follow zero-shot demonstrative instructions

Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey

Visit-bench: A benchmark for vision-language instruction following inspired by real-world use

Remote sensing image change captioning with dual-branch transformers: A new method and a large scale dataset

Visual instruction tuning with polite flamingo

Fashion iq: A new dataset towards retrieving images by natural language feedback