Lists (2)
Sort Name ascending (A-Z)
Stars
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Official inference repo for FLUX.1 models
[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
A better way to make GUIs for your python apps
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
リアルタイムボイスチェンジャー Realtime Voice Changer
Core Engine of Singing Voice Conversion & Singing Voice Clone
Instant voice cloning by MIT and MyShell.
Industry leading face manipulation platform
DeepFaceLab is the leading software for creating deepfakes.
Real-time face swap for PC streaming or video calls
hassan-sd / roop-unlocked
Forked from s0md3v/roopone-click deepfake (face swap)
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
wlsdml1114 / diff-svc
Forked from prophesier/diff-svcSinging Voice Conversion via diffusion model
The swiss army knife of lossless video/audio editing
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
Robust Speech Recognition via Large-Scale Weak Supervision
Large World Model -- Modeling Text and Video with Millions Context
Open-source simulator for autonomous driving research.
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Build a RAG (Retrieval Augmented Generation) pipeline from scratch and have it all run locally.
This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024