Stars
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
The repository for the paper titled "Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks"
Codebase for Aria - an Open Multimodal Native MoE
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Tr…
Adapted from https://note.com/kohya_ss/n/nbf7ce8d80f29 for easier cloning
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Code for NeurIPS 2023 paper "Restart Sampling for Improving Generative Processes"
A large-scale text-to-image prompt gallery dataset based on Stable Diffusion
Better Aligning Text-to-Image Models with Human Preference. ICCV 2023
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
LogAI - An open-source library for log analytics and intelligence
A Unified Semi-Supervised Learning Codebase (NeurIPS'22)
LAVIS - A One-stop Library for Language-Vision Intelligence
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
EVA Series: Visual Representation Fantasies from BAAI
BotSIM - a data-efficient end-to-end Bot SIMulation toolkit for evaluation, diagnosis, and improvement of commercial chatbots
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
A latent text-to-image diffusion model
OmniXAI: A Library for eXplainable AI
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities