Highlights
LLMs
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference,…
ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
Official inference library for Mistral models
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
🚀🧠💬 Supercharged Custom Instructions for ChatGPT (non-coding) and ChatGPT Advanced Data Analysis (coding).
An Autonomous LLM Agent for Complex Task Solving
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
An LLM-powered advanced RAG pipeline built from scratch
Run Mixtral-8x7B models in Colab or consumer desktops
Large World Model -- Modeling Text and Video with Millions Context
Hackable and optimized Transformers building blocks, supporting a composable construction.
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
The official PyTorch implementation of Google's Gemma models
lightweight, standalone C++ inference engine for Google's Gemma models.
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting…
Absolute minimalistic implementation of a GPT-like transformer using only numpy (<650 lines).
BAML is a language that helps you get structured data from LLMs, with the best DX possible. Works with all languages. Check out the promptfiddle.com playground