분류 및 연도 | 이름 | 저자 | 구현 내용 |
---|---|---|---|
Vision | |||
2014 | VAE | Kingma and Welling | [✓] Training on MNIST [✓] Encoder output visualization [✓] Decoder output visualization |
2015 | CAM | Zhou et al. | [✓] Application to GoogleNet [✓] Bounding box generation from Class Activation Map |
2016 | Gatys et al., 2016 (image style transfer) | Gatys et al. | [✓] Application to VGGNet-19 |
YOLO | Redmon et al. | [✗] Training on VOC 2012 [✗] Class probability map [✗] Ground truth vlisualization on grid |
|
DCGAN | Radford et al. | [✓] Training on CelebA at 64 × 64 [✓] Sampling [✓] Latent space interpolation |
|
Noroozi et al., 2016 | Noroozi et al. | [✓] Model architecture [✓] Chromatic aberration [✓] Permutation set |
|
Zhang et al., 2016 (image colorization) | Zhang et al. | [✓] Empirical probability distribution visualization [✗] Color space |
|
2014 2017 |
Conditional GAN WGAN-GP |
Mirza et al. Gulrajani et al. |
[✓] Training on MNIST |
2016 2017 |
VQ-VAE & PixelCNN | Oord et al. Oord et al. |
[✓] Training on Fashion MNIST [✓] Training on CIFAR-10 |
2017 | Pix2Pix | Isola et al. | [✓] Training on Google Maps [✓] Training on Facades [✗] Inference on larger resolution |
CycleGAN | Zhu et al. | [✓] Training on monet2photo [✓] Training on vangogh2photo [✓] Training on cezanne2photo [✓] Training on ukiyoe2photo [✓] Training on horse2zebra [✓] Training on summer2winter_yosemite |
|
Noroozi et al., 2017 | Noroozi et al. | [✓] Constrastive loss | |
2018 | PGGAN | Karras et al. | [✓] Training on CelebA-HQ at 512 × 512 |
DeepLab v3 | Chen et al. | [✓] Training on VOC 2012 [✓] Prediction on VOC 2012 validation set [✓] Average mIoU [✓] Model output visualization |
|
RotNet | Gidaris et al | [✓] Attention map visualization | |
StarGAN | Yunjey Choi et al. | [✓] Model architecture | |
2020 | STEFANN | Roy et al. | [✓] FANnet architecture [✓] Training FANnet on Google Fonts [✓] Custom Google Fonts dataset [✓] Average SSIM |
DDPM | Ho et al. | [✓] Training on CelebA at 32 × 32 [✓] Training on CelebA at 64 × 64 [✓] Denoising process visualization [✓] Sampling using linear interpolation [✓] Sampling using coarse-to-fine interpolation |
|
DDIM | Song et al. | [✓] Normal sampling [✓] Sampling using spherical linear interpolation [✓] Sampling using grid interpolation [✓] Truncated normal |
|
ViT | Dosovitskiy et al. | [✓] Training on CIFAR-10 [✓] Training on CIFAR-100 [✓] Attention map visualization using Attention Roll-out [✓] Position embedding similarity visualization [✓] Position embedding interpolation [✓] CutOut [✓] CutMix [✓] Hide-and-Seek |
|
SimCLR | Chen et al. | [✓] Normalized temperature-scaled cross entropy loss [✓] Data augmentation [✓] Pixel intensity histogram |
|
DETR | Carion et al. | [✓] Model architecture [✗] Bipartite matching & loss [✗] Batch normalization freezing [✗] Data preparation [✗] Training on COCO 2017 |
|
2021 | Improved DDPM | Nichol and Dhariwal | [✓] Cosine diffusion schedule |
Classifier-Guidance | Dhariwal and Nichol | [✓] Training on CIFAR-10[✗] AdaGN [✗] BiGGAN Upsample/Downsample [✗] Improved DDPM sampling [✗] Conditional/Unconditional models [✗] Super-resolution model [✗] Interpolation |
|
ILVR | Choi et al. | [✓] Sampling using single reference [✓] Sampling using various downsampling factors [✓] Sampling using various conditioning range |
|
SDEdit | Meng et al. | [✓] User input stroke simulation [✓] Application to CelebA at 64 × 64 |
|
MAE | He et al. | [✓] Model architecture for pre-training [✗] Model architecture for self-supervised learning [✗] Training on ImageNet-1K [✗] Fine-tuning [✗] Linear probing |
|
Copy-Paste | Ghiasi et al. | [✓] COCO dataset processing [✓] Large scale jittering [✓] Copy-Paste (within mini-batch) [✓] Data visualization [✗] Gaussian filter |
|
ViViT | Arnab et al. | [✓] 'Spatio-temporal attention' architecture [✓] 'Factorised encoder' architecture [✓] 'Factorised self-attention' architecture |
|
2022 | CFG | Ho et al. | |
Language | |||
2017 | Transformer | Vaswani et al. | [✓] Model architecture [✓] Position encoding visualization |
2019 | BERT | Devlin et al. | [✓] Model architecture [✓] Masked language modeling [✓] BookCorpus data pre-processing [✓] SQuAD data pre-processing [✓] SWAG data pre-processing |
Sentence-BERT | Reimers et al. | [✓] Classification loss [✓] Regression loss [✓] Constrastive loss [✓] STSb data pre-processing [✓] WikiSection data pre-processing [✗] NLI data pre-processing |
|
RoBERTa | Liu et al. | [✓] BookCorpus data pre-processing [✓] Masked language modeling [✗] BookCorpus data pre-processing (SEGMENT-PAIR + NSP) [✗] BookCorpus data pre-processing (SENTENCE-PAIR + NSP) [✓] BookCorpus data pre-processing (FULL-SENTENCES) [✗] BookCorpus data pre-processing (DOC-SENTENCES) |
|
2021 | Swin Transformer | Liu et al. | [✓] Patch partition [✓] Patch merging [✓] Relative position bias [✓] Feature map padding [✓] Self-attention in non-overlapped windows [✗] Shifted Window based Self-Attention |
2024 | RoPE | Su et al. | [✓] Rotary Positional Embedding |
Vision-Language | |||
2021 | CLIP | Radford et al. | [✓] Training on Flickr8k + Flickr30k [✓] Zero-shot classification on ImageNet1k (mini) [✓] Linear classification on ImageNet1k (mini) |
-
-
- Seoul, Republic of Korea
Pinned Loading
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.