Lists (1)
Sort Name ascending (A-Z)
Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Machine Learning Engineering Open Book
[ICCV 2023] Tracking Anything with Decoupled Video Segmentation
High-Resolution Image Synthesis with Latent Diffusion Models
Official Code for DragGAN (SIGGRAPH 2023)
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Baseline model for nocaps benchmark, ICCV 2019 paper "nocaps: novel object captioning at scale".
A beautiful, simple, clean, and responsive Jekyll theme for academics
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
Optimized code based on M2 for faster image captioning training
Using pretrained encoder and language models to generate captions from multimedia inputs.
A collection of resources and papers on Diffusion Models
[ICCV2023 Best Paper Finalist] PyTorch implementation of DiffusionDet (https://arxiv.org/abs/2211.09788)
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
yolov5 + csl_label.(Oriented Object Detection)(Rotation Detection)(Rotated BBox)基于yolov5的旋转目标检测
Minimal PyTorch implementation of YOLOv3
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Code for ALBEF: a new vision-language pre-training method
PyTorch original implementation of Cross-lingual Language Model Pretraining.
Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
ShivamShrirao / diffusers
Forked from huggingface/diffusers🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
Pure Javascript OCR for more than 100 Languages 📖🎉🖥