An open-source, efficient and extensible framework for enhancing safety🛡️ in text-to-image (T2I) generation🖼️, designed to prevent misuse and improve flexibility. It is built to be accessible, speedy, and reliable for the entire community.
- ⚡️Fast: Detects unsafe input prompts in approximately 1ms and can be trained on a single NVIDIA 3090 GPU.
- 🔧Extensible: Supports customized unsafe concepts to block; compatible with all T2I models based on text encoders, SD/SDXL etc.
- 🏆State-of-the-art: Faster performance, higher accuracy, and better scalability than existing safety methods.
- 🌐Open: All processes—data generation, training, testing, and inference—are fully open-source.
📰 Latent Guard has been reported by famous tech media outlets like TechXplore and MarkTechPost.
[2024/09/25 New]🚀🚀🚀: We released our code📝 and the model weights⚙️!
[2024/07/15]: We released our dataset CoPro in dataset/CoPro_v1.0.json.
[2024/07]: Our paper has been accepted by ECCV 2024.
This is the official repo of the paper accepted by ECCV 2024 Latent Guard: a Safety Framework for Text-to-image Generation(arXiv).
@article{liu2024latent,
title={Latent Guard: a Safety Framework for Text-to-image Generation},
author={Liu, Runtao and Khakzar, Ashkan and Gu, Jindong and Chen, Qifeng and Torr, Philip and Pizzati, Fabio},
journal={arXiv preprint arXiv:2404.08031},
year={2024}
}
The dataset is in the repository CoPro in dataset/CoPro_v1.0.json
and the model weights are stored in model_parameters.pth
.
To set up the conda environment, run the following command (this process takes around 10 minutes depending on the network and server):
conda env create -f latentguard.yml
After installation, activate the environment with:
conda activate latentguard
To run the inference, execute the following command:
python inference.py
Or you can choose to specify the following parameters according to your requirements:
python inference.py --file_path FILEPATH --threshold VALUE
--file_path
: Specifies the path to the unsafe file. The default value is'unsafe_sample.txt'
(a file in this repo), which provides a sample format for the prompt data to be detected.--threshold
: This parameter determines the threshold value of predicting unsafe or not. It can be adjusted based on the data distribution.
To improve the speed of testing and training, it's necessary to preprocess the clip_cache.pt
file, which stores the CLIP embedding representations of the prompts. This process may take over 20 minutes and will display a progress bar.
Run python prepare.py
to obtain the clip_cache.pt
file.
Once the execution is complete, the clip_cache.pt
file will be generated. The path has already been set in config.py
, so no manual configuration is needed. You can proceed with the subsequent commands.
You can simply run python test.py
to obtain the results for Table 1b.
You can simply run python main.py
to train Latent Guard on CoPro.
Our model takes only 30 minutes⚡️ to train on a single NVIDIA 3090 GPU.
Recent text-to-image generators are composed of a text encoder and a diffusion model. Their deployment without appropriate safety measures creates risks of misuse (left). We propose Latent Guard (right), a safety method designed to block malicious input prompts. Our idea is to detect the presence of blacklisted concepts on a learned latent space on top of the text encoder. This allows to detect blacklisted concepts beyond their exact wording, extending to some adversarial attacks too ("<ADV>"). The blacklist is adaptable at test time, for adding or removing concepts without retraining. Blocked prompts are not processed by the diffusion model, saving computational costs.
With the ability to generate high-quality images, text-to-image (T2I) models can be exploited for creating inappropriate content. To prevent misuse, existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification, requiring large datasets for training and offering low flexibility. Hence, we propose Latent Guard, a framework designed to improve safety measures in text-to-image generation. Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder, where it is possible to check the presence of harmful concepts in the input text embeddings. Our proposed framework is composed of a data generation pipeline specific to the task using large language models, ad-hoc architectural components, and a contrastive learning strategy to benefit from the generated data. The effectiveness of our method is verified on three datasets and against four baselines.
Overview of Latent Guard. We first generate a dataset of safe and unsafe prompts centered around blacklisted concepts (left). Then, we leverage pretrained textual encoders to extract features, and map them to a learned latent space with our Embedding Mapping Layer (center). Only the Embedding Mapping Layer is trained, while all other parameters are kept frozen. We train by imposing a contrastive loss on the extracted embedding, bringing closer the embeddings of unsafe prompts and concepts, while separating them from safe ones (right).
CoPro generation. For
Evaluation on CoPro. We provide accuracy (a) and AUC (b) for Latent Guard and baselines on CoPro. We either rank first or second in all setups, training only on Explicit ID training data. We show examples of prompts of CoPro and generated images in (c). The unsafe image generated advocate the quality of our dataset. Latent Guard is the only method blocking all the tested prompts.
Evaluation on Unseen Datasets We test Latent Guard on existing datasets for both Unsafe Diffusion and I2P++. Although the input T2I prompts distribution is different from the one in CoPro, we still outperform all baselines and achieve a robust classification.
Computational cost. We measure processing times and memory usage for different batch sizes and concepts in
Feature space analysis. Training Latent Guard on CoPro makes safe/unsafe regions naturally emerge (right). In the CLIP latent space, safe/unsafe embeddings are mixed (left).