Skip to content

ssu-humane/news-images-acl24

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assessing News Thumbnail Representativeness (ACL 2024)

This repository provides the dataset and code for our paper, "Assessing News Thumbnail Representativeness: Counterfactual text can enhance the cross-matching ability," to be published at ACL 2024.

Task

Given a news thumbnail image I and its news text T, the task is to predict a binary label L indicating whether a news thumbnail image I portrays the actor of a news event, which can be identified from T.

Dataset: NewsTT

We introduce a dataset of 1,000 news thumbnail images and text for the task, along with high-quality labels. This dataset is intended for the zero-shot evaluation of vision language models.

Image Title Summary Label
image In their first call, Biden presses Putin on Navalny arrest, cyberattacks, bounties on U.S. troops President Biden confronted Russian leader Vladimir Putin about a range of issues, including the arrest of Alexei Navalny, the SolarWinds cyberattack, interference in the 2020 election, and the alleged plot to assassinate American soldiers, in their first call since Biden's inauguration. 1
image Pastor of “Progressive Church” in California works in Adult Entertainment Industry A "Progressive Church" in California is co-pastored by a husband and his wife, who is an ordained minister and actively works in the Adult Entertainment Industry. 0
  • Image: news thumbnail image
  • Title: news headline
  • Summary: summarized body text, done by ChatGPT
  • Label
    • 1: the image portrays at least one actor of the news event
    • 0: the image does not present any actor of the news event

The dataset is available upon request: [LINK]

Method: CFT-CLIP

We present CFT-CLIP, a contrastive learning framework that uses counterfactual text to update vision and language bi-encoders.

This figure illustrates the key idea of the proposed method. Given a pair of a news thumbnail image and an article, the method generates counterfactual news text and uses it as negative samples for contrastive learning. CFT-CLIP is a CLIP-like vision-language transformer encoder that represents the semantics of news thumbnails and news text. It aims to improve the vision and language bi-encoder by contrastive updates involving the counterfactual text generated from an input text.

Model usage

You can use the pretrained checkpoint available at HuggingFace Hub. [LINK]

import torch
from PIL import Image
from transformers import AutoModel, AutoProcessor

processor = AutoProcessor.from_pretrained("humane-lab/CFT-CLIP")
model = AutoModel.from_pretrained("humane-lab/CFT-CLIP")


image = "cat.jpg"
image = Image.open(image)
inputs = processor(text=["this is a cat"], images=image, return_tensors="pt")

outputs = model(**inputs)
text_embeds = outputs.text_embeds
image_embeds = outputs.image_embeds

Code for replication

Counterfactual text generation

The pretraining corpus is not provided through the repository due to copyright issues.

python utils/save_pixel_values.py # Extract pixel values ​​in advance for learning speed
python utils/get_ntt.py --data_path 'train.pkl' --save_path 'train.pkl' --target_text 'summary' # Extract ntt from news text
python utils/image_text_cossine_similarity.py --data_path 'train.pkl' --save_path 'train.pkl' --target_text 'summary' # Extract CLIP cossine similarity between image-text pairs
python utils/counterfactual.py --data_path 'train.pkl' --save_path 'train.pkl' # counterfactual text generation 

Training

Set configure using config.py.

python train.py

Evaluation

python evaluation.py --pixel_path "data/pixel_values" ...

Attribution

The code and dataset are shared under CC BY-NC 4.0. You are free to use the resources for the non-commercial purpose.

@article{yoon2024assessing,
  title={Assessing News Thumbnail Representativeness: Counterfactual text can enhance the cross-modal matching ability},
  author={Yoon, Yejun and Yoon, Seunghyun and Park, Kunwoo},
  journal={arXiv preprint arXiv:2402.11159},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages