Skip to content
View snsten's full-sized avatar

Block or report snsten

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

LLMs

22 repositories

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference,…

Python 12,671 863 Updated Nov 17, 2024

ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.

Python 9,427 696 Updated Jul 11, 2024

Official inference library for Mistral models

Jupyter Notebook 9,724 862 Updated Nov 12, 2024

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,673 366 Updated Jul 11, 2024

🚀🧠💬 Supercharged Custom Instructions for ChatGPT (non-coding) and ChatGPT Advanced Data Analysis (coding).

JavaScript 6,613 455 Updated Jan 17, 2024

An Autonomous LLM Agent for Complex Task Solving

Python 8,166 846 Updated Aug 12, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,679 991 Updated Nov 13, 2024

An LLM-powered advanced RAG pipeline built from scratch

Python 798 51 Updated Jan 26, 2024

leaked prompts of GPTs

28,762 3,901 Updated Sep 27, 2024

Run Mixtral-8x7B models in Colab or consumer desktops

Python 2,294 226 Updated Apr 8, 2024

Large World Model -- Modeling Text and Video with Millions Context

Python 7,153 552 Updated Oct 19, 2024

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 8,659 616 Updated Nov 14, 2024

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 9,192 866 Updated Jul 1, 2024

LLM inference in C/C++

C++ 68,017 9,753 Updated Nov 19, 2024

The official PyTorch implementation of Google's Gemma models

Python 5,290 508 Updated Jul 31, 2024

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 5,991 509 Updated Nov 18, 2024

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Python 5,994 520 Updated Sep 6, 2024

Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting…

Jupyter Notebook 15,200 2,197 Updated Nov 17, 2024

Inference Llama 2 in one file of pure C

C 17,474 2,090 Updated Aug 6, 2024

Absolute minimalistic implementation of a GPT-like transformer using only numpy (<650 lines).

Python 250 12 Updated Nov 20, 2023

The Memory layer for your AI apps

Python 22,871 2,102 Updated Nov 18, 2024

BAML is a language that helps you get structured data from LLMs, with the best DX possible. Works with all languages. Check out the promptfiddle.com playground

Rust 1,361 50 Updated Nov 19, 2024