Can we use generated human faces to train a model? If the model can generate faces that are realistic enough, how can we use it to generate faces we want? What is the mapping between prompt and vary human faces?
Besides, many times we need to construct a good test dataset to test the robustness of the trained model. But selecting images carefully from searching engines is laborious and also facing the problem of cleaning (due to the low quality of the data like unrelated, low-resolution, unbalanced, etc.). How the diffusion model can help us solve this problem?
What's more, fake faces detection is also a essential topic of AI safety. We can generate target faces directly through deep generative models, but just as important is how do we detect them? For example, our logo is A FAKE FACE!
In the past 3 years, we have seen the explosive growth of the Diffusion Model, it now can generate brilliant pictures according to user's prompts. Here I test the diffusion model in the capacity of generating realistic human faces. And purposes of this project are the following two points:
- Construct a not bad fake faces dataset;
- Use these data to do something interesting. I don't know if it helps anyone, but it works for me 🤣🤣🤣
The principles I followed in generating faces are:
- Realistic (use Realistic_Vision_V2.0:1 by SG_161222)
- High-resolution (the resolution of all images are 512*512)
- Vary & Balanced (support vary faces and keep good data distribution)
Here is the generated face data and corresponding descriptions:
Attribute | Specific Features | Male | Female | Special Prompt |
---|---|---|---|---|
Age | Child (0-5) | 300 | 300 | 1 y.o., 3 y.o. |
Teenager (5-18) | 600 | 600 | 8 y.o., 15 y.o. | |
Young people (18-40) | 300 | 300 | 25 y.o., 35 y.o. | |
Middle aged (40-60) | 300 | 300 | 45 y.o., 55 y.o. | |
Old people (60+) | 600 | 600 | 60,80,100 y.o., Grandma,Grandpa | |
Emotion | Smile | 600 | 600 | smiling, laughing |
Angry | 600 | 600 | Angry, pissed-off face, yelling | |
Occlusion | Only glasses | 600 | 600 | glasses,sunglasses,swimming goggles,skiing goggles, |
Only mask | 300 | 300 | (masked:1.2), antigas mask | |
Only make up | 300 | 300 | highly make up, eyeshadow,heavy black eyeliner, joker, Halloween makeup | |
Only hands | 300 | 300 | put Hand in front of face,put Hand in front of hair | |
Complex | 300 | 300 | glasses,(masked:1.2),Halloween makeup,put Hand in front of face,put Hand in front of hair | |
Illumination | Under water | 300 | 300 | under water |
Strong light | 300 | 300 | (sun behind:1.2), strong sun shine | |
Dark | 300 | 300 | in the night, dark light, (very dark scene:1.2) | |
Large pose | Left & right | 2000 | 2000 | side view |
Up | 300 | 300 | (looking up:1.3) | |
Hair | Blond | 300 | 300 | blond hair |
Bangs | 300 | 300 | (Bangs:1.2) | |
Bald | 300 | 300 | Bald | |
Others | Moustache | 300 | - | moustache,sideburns,goatee,front view |
Open Mouse | 600 | 600 | (talking loudly:1.4),smile,neutral | |
Close Eyes | 600 | 600 | (sleepy,close eyes:1.4) | |
ALL | Origin | 20K | 20K | - |
All the pictures can be downloaded at
Link1: Baidu Disk [extract code: tqxd].
Link2: terabox
I use this template to get good generation results:
- Prompt:
RAW photo, a close up portrait photo of [year] y.o [sex], [human race],[special prompt],(high detailed skin:1.2), 8k uhd, dslr, high quality, film grain, Fujifilm XT3.
- Negative Prompt:
(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime:1.4), text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck.
- Hyper-parameter:
DPM++ 2M Karras with 20 steps
CFG Scale 7
A reference containing Portraits and Keywords that you can use with Stable Diffusion.
-
🥸 Occlusion
glasses,sunglasses,swimming goggles,skiing goggles, man glasses,sunglasses,swimming goggles,skiing goggles, woman
(masked:1.2), antigas mask, man (masked:1.2), antigas mask, woman
highly make up, eyeshadow,heavy black eyeliner, joker, Halloween makeup, man highly make up, eyeshadow,heavy black eyeliner, joker, Halloween makeup, woman
put Hand in front of face, put Hand in front of hair, man put Hand in front of face, put Hand in front of hair, woman
glasses,(masked:1.2), Halloween makeup,put Hand in front of face, put Hand in front of hair, man glasses,(masked:1.2), Halloween makeup,put Hand in front of face, put Hand in front of hair, woman
Thanks the creator of the model for his brilliant work and also thanks his reference models.
Realistic_Vision_V2.0:1 by SG_161222
- Although the realistic ability of this version model to generate pictures is enhanced, the generalization is weakened. As reported in MidJourney-Styles-and-Keywords, the Mid-v5 has a better ability to process finer facial details. I will keep focusing on the generated model and use better model to generated more controllable human faces.
@repo{2023agfd20k,
title={A Generated Face Dataset: AGFD-20K},
author={Zhongqi Wang},
howpublished = {\url{https://github.com/Robin-WZQ/AGFD-20K}},
year={2023}
}