The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 32,134 4,756 Updated Nov 4, 2024

hijkzzz / Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

4,977 278 Updated Nov 1, 2024

zouxiaohang / TinySTL

TinySTL is a subset of STL(cut some containers and algorithms) and also a superset of STL(add some other containers and algorithms)

C++ 2,321 634 Updated Oct 27, 2018

TianYin123 / Prompt-Engineering-Study

从入门到精通，该项目力求做到最清晰、最系统的中文Prompt指北

3 Updated Aug 23, 2024

NUS-HPC-AI-Lab / VideoSys

VideoSys: An easy and efficient system for video generation

Python 1,756 118 Updated Nov 5, 2024

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 3,373 189 Updated Nov 5, 2024

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Python 1,235 93 Updated Oct 30, 2024

THUDM / SwissArmyTransformer

SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.

Python 990 95 Updated Oct 30, 2024

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 729 37 Updated Oct 30, 2024

liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

HTML 10,344 1,025 Updated Nov 3, 2024

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 35,340 4,098 Updated Nov 5, 2024

PKUFlyingPig / cs-self-learning

计算机自学指南

HTML 57,562 6,867 Updated Nov 2, 2024

ModelTC / lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,574 203 Updated Nov 5, 2024

66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 193 14 Updated Jun 18, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,771 192 Updated Nov 1, 2024

qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Python 2,993 458 Updated Jul 13, 2024

triton-lang / triton

Development repository for the Triton language and compiler

C++ 13,298 1,628 Updated Nov 5, 2024

deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

3,560 149 Updated Sep 25, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,570 973 Updated Nov 5, 2024

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python 19,116 1,571 Updated Nov 2, 2024

karpathy / llama2.c

Inference Llama 2 in one file of pure C

C 17,430 2,084 Updated Aug 6, 2024

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 293 67 Updated Sep 8, 2024

state-spaces / mamba

Mamba SSM architecture

Python 13,087 1,113 Updated Nov 5, 2024

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 38,770 4,340 Updated Nov 5, 2024

DefTruth / CUDA-Learn-Notes

🎉 Modern CUDA Learn Notes with PyTorch: CUDA Cores, Tensor Cores, fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, hgemm, sgemv, warp/block reduce, elementwise, softmax, layernorm, rmsnorm.

Cuda 1,380 152 Updated Nov 5, 2024

LitLeo / TensorRT_Tutorial

C++ 987 184 Updated Mar 13, 2024

Leeon Leeon-K

Highlights

Lists (3)

DeepLearning studying

Model Compression

nlp

Stars