Highlights
- Pro
Stars
LPIPS metric. pip install lpips
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.
Pytorch implementation of the CREPE pitch tracker
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
"Effective Whole-body Pose Estimation with Two-stages Distillation" (ICCV 2023, CV4Metaverse Workshop)
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors
Ongoing research training transformer models at scale
Example models using DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
A series of large language models developed by Baichuan Intelligent Technology
The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Code for Motion Representations for Articulated Animation paper
[CVPR 2022] Thin-Plate Spline Motion Model for Image Animation.
Extracts essential Mediapipe face landmarks and arranges them in a sequenced order.
The official PyTorch implementation of the paper "Human Motion Diffusion Model"
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Faster Whisper transcription with CTranslate2
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
An implement of GlowTTS model. Several modes are added: speaker embedding, prosody encoder(GST), and gradient reversal.
Easily train a good VC model with voice data <= 10 mins!
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
speech self-supervised representations
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
基于 OpenAI API 的文本翻译、文本润色、语法纠错 Bob 插件,让我们一起迎接不需要巴别塔的新时代!Licensed under CC BY-NC-SA 4.0
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis