[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 791 36 Updated Nov 19, 2024

RUC-NLPIR / FlashRAG

⚡FlashRAG: A Python Toolkit for Efficient RAG Research

Python 1,334 109 Updated Nov 19, 2024

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 20,286 2,238 Updated Aug 12, 2024

fc2869 / lo-fit

LoFiT: Localized Fine-tuning on LLM Representations

Python 21 4 Updated Jun 25, 2024

likenneth / honest_llama

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Python 466 37 Updated Sep 29, 2024

OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,635 889 Updated Oct 22, 2024

HSLiu-Initial / CtrlA

This includes the original implementation of CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control.

Jupyter Notebook 66 9 Updated Oct 9, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

12,703 809 Updated Nov 10, 2024

gabriben / awesome-generative-information-retrieval

625 48 Updated Oct 15, 2024

xufangzhi / ENVISIONS

A Neural-Symbolic Self-Training Framework

C 99 3 Updated Jul 23, 2024

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)

Python 2,672 250 Updated Nov 17, 2024

NVlabs / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Python 1,998 158 Updated Oct 31, 2024

CoIR-team / coir

A Comprehensive Benchmark for Code Information Retrieval.

Python 63 11 Updated Oct 21, 2024

dvlab-research / Mr-Ben

This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"

Python 43 Updated Oct 31, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 30,418 4,602 Updated Nov 19, 2024

naklecha / llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 13,742 1,110 Updated May 23, 2024

facebookresearch / generative-recommenders

Repository hosting code used to reproduce results in "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).

Python 755 142 Updated Nov 19, 2024

stanfordnlp / dspy

DSPy: The framework for programming—not prompting—language models

Python 18,891 1,445 Updated Nov 19, 2024