Zero-Shot Text Classification via Self-Supervised Tuning

This repository contains the code and pre-trained models for the paper "Zero-Shot Text Classification via Self-Supervised Tuning", which was accepted to Findings of ACL 2023.

News

[2023/9/25] zero-shot-classify-SSTuning-base and zero-shot-classify-SSTuning-XLM-R are available on ModelScope.

[2023/8/14] We release a multilingual model for zero-shot text classification, which is zero-shot-classify-SSTuning-XLM-R.

[2023/5/12] Initial release of the models.

Model description

The model is tuned with unlabeled data using a learning objective called first sentence prediction (FSP). The FSP task is designed by considering both the nature of the unlabeled corpus and the input/output format of classification tasks. The training and validation sets are constructed from the unlabeled corpus using FSP.

During tuning, BERT-like pre-trained masked language models such as RoBERTa and ALBERT are employed as the backbone, and an output layer for classification is added. The learning objective for FSP is to predict the index of the correct label. A cross-entropy loss is used for tuning the model.

Model variations

There are four versions of models released. The details are:

Model	Backbone	#params	lang	acc	Speed	#Train
zero-shot-classify-SSTuning-base	roberta-base	125M	En	Low	High	20.48M
zero-shot-classify-SSTuning-large	roberta-large	355M	En	Medium	Medium	5.12M
zero-shot-classify-SSTuning-ALBERT	albert-xxlarge-v2	235M	En	High	Low	5.12M
zero-shot-classify-SSTuning-XLM-R	xlm-roberta-base	278M	Multi	-	-	20.48M

Please note that zero-shot-classify-SSTuning-XLM-R is trained with 20.48M English samples only. However, it can also be used in other languages as long as xlm-roberta supports.

Performance

Performance of the released models for English (accuracy)

Model	yah	agn	dbp	20n	sst2	imd	ylp	mr	amz	sst5	Avg
SSTuning-base	59.8	83.1	84.7	50.2	87.3	90.0	93.6	85.4	94.9	42.7	77.2
SSTuning-large	62.5	84.8	86.6	55.0	90.5	92.1	95.9	87.7	95.5	48.6	79.9
SSTuning-ALBERT	64.5	86.0	93.7	62.7	90.8	93.5	95.8	88.8	95.6	44.3	81.6
SSTuning-XLM-R	57.9	77.3	71.3	45.1	76.3	79.7	93.0	73.1	92.4	33.8	70.0

Performance of the released model for Multiligual (accuracy)

The test datasets are constructed from amazon_reviews_multi for binary sentiment classification. We map samples of star 1 or 2 as negative, and star 4 or 5 as positive.

Model	en	zh	de	es	fr	ja	Avg
zero-shot-classify-SSTuning-XLM-R	91.6	83.7	89.3	85.6	86.6	88.0	87.4

Compare with ChatGPT (gpt-3.5-turbo-0613)

We sample up to 1000 samples from each of the evaluation datasets and use the released models to do the evaluation. The sampled data is here.

Model	yah	agn	dbp	20n	sst2	imd	ylp	mr	amz	sst5	Avg
SSTuning-base	60.5	84.1	84.7	51.5	87.3	90.2	95.2	85.4	95	43.4	77.7
SSTuning-large	61.4	84.7	88.3	55.7	90.5	93.2	96.7	87.7	95.9	48.9	80.3
SSTuning-ALBERT	63.2	85.6	93.7	64.4	90.8	94.2	96.0	88.8	96.2	44.0	81.7
ChatGPT	61.9	82.5	95.1	54.3	93.7	93.7	98.1	90.0	96.5	47.5	81.3
ChatGPT (with post-processing)	62.1	82.5	95.1	55.0	93.7	93.7	98.1	90.0	96.5	47.5	81.4

As the output of ChatGPT may not fall within the pre-degined label space, we report hit rate (percentage of the outputs falling into the label space).

	yah	agn	dbp	20n	sst2	imd	ylp	mr	amz	sst5
Output hit rate	95.3	96.6	99.8	90.8	99.8	100	100	99.8	100	100

Intended uses & limitations

The model can be used for zero-shot text classification such as sentiment analysis and topic classification. No further finetuning is needed.

The number of labels should be 2 ~ 20.

Quick start for inference

You can try the model with the Colab Notebook or the following code:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch, string, random

tokenizer = AutoTokenizer.from_pretrained("DAMO-NLP-SG/zero-shot-classify-SSTuning-base")
model = AutoModelForSequenceClassification.from_pretrained("DAMO-NLP-SG/zero-shot-classify-SSTuning-base")

text = "I love this place! The food is always so fresh and delicious."
list_label = ["negative", "positive"]

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
list_ABC = [x for x in string.ascii_uppercase]

def check_text(model, text, list_label, shuffle=False): 
    list_label = [x+'.' if x[-1] != '.' else x for x in list_label]
    list_label_new = list_label + [tokenizer.pad_token]* (20 - len(list_label))
    if shuffle: 
        random.shuffle(list_label_new)
    s_option = ' '.join(['('+list_ABC[i]+') '+list_label_new[i] for i in range(len(list_label_new))])
    text = f'{s_option} {tokenizer.sep_token} {text}'

    model.to(device).eval()
    encoding = tokenizer([text],truncation=True, max_length=512,return_tensors='pt')
    item = {key: val.to(device) for key, val in encoding.items()}
    logits = model(**item).logits
    
    logits = logits if shuffle else logits[:,0:len(list_label)]
    probs = torch.nn.functional.softmax(logits, dim = -1).tolist()
    predictions = torch.argmax(logits, dim=-1).item() 
    probabilities = [round(x,5) for x in probs[0]]

    print(f'prediction:    {predictions} => ({list_ABC[predictions]}) {list_label_new[predictions]}')
    print(f'probability:   {round(probabilities[predictions]*100,2)}%')

check_text(model, text, list_label)
# prediction:    1 => (B) positive.
# probability:   99.92%

Training

Enviroment Preparation

conda env create -f environment.yml
conda activate SSTuning

There are two steps for self-supervised tuning:

training data generation from Wikipedia and Amazon Product Review dataset
Self-supervised Tuning

Training data generation

Please refer to README for details.

Self-supervised Tuning

source 01_tuning.sh

Evaluation

Put the datasets the folder ./data_testing and modify ./data_testing/label_dict_classification.json (Please refer to README for details.)
modify 02_evaluate.sh accordingly and run

source 02_evaluate.sh

BibTeX entry and citation info

@inproceedings{acl23/SSTuning,
  author    = {Chaoqun Liu and
               Wenxuan Zhang and
               Guizhen Chen and
               Xiaobao Wu and
               Anh Tuan Luu and
               Chip Hong Chang and 
               Lidong Bing},
  title     = {Zero-Shot Text Classification via Self-Supervised Tuning},
  booktitle = {Findings of the Association for Computational Linguistics: ACL 2023},
  year      = {2023},
  url       = {https://arxiv.org/abs/2305.11442},
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data_testing		data_testing
data_training		data_training
figures		figures
.gitignore		.gitignore
01_tuning.sh		01_tuning.sh
02_evaluate.sh		02_evaluate.sh
LICENSE		LICENSE
README.md		README.md
SSTuning.py		SSTuning.py
environment.yml		environment.yml
evaluation.py		evaluation.py
example.py		example.py
functions.py		functions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zero-Shot Text Classification via Self-Supervised Tuning

News

Model description

Model variations

Performance

Performance of the released models for English (accuracy)

Performance of the released model for Multiligual (accuracy)

Compare with ChatGPT (gpt-3.5-turbo-0613)

Intended uses & limitations

Quick start for inference

Training

Enviroment Preparation

Training data generation

Self-supervised Tuning

Evaluation

BibTeX entry and citation info

About

Releases

Packages

Languages

License

DAMO-NLP-SG/SSTuning

Folders and files

Latest commit

History

Repository files navigation

Zero-Shot Text Classification via Self-Supervised Tuning

News

Model description

Model variations

Performance

Performance of the released models for English (accuracy)

Performance of the released model for Multiligual (accuracy)

Compare with ChatGPT (gpt-3.5-turbo-0613)

Intended uses & limitations

Quick start for inference

Training

Enviroment Preparation

Training data generation

Self-supervised Tuning

Evaluation

BibTeX entry and citation info

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages