Skip to content
View Leeon-K's full-sized avatar
  • BUAA
  • Beijing China
  • 00:24 (UTC +08:00)
  • X @Lick

Highlights

  • Pro

Block or report Leeon-K

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

Learning material for CMU10-714: Deep Learning System

Jupyter Notebook 211 35 Updated May 12, 2024

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 31,679 4,711 Updated Oct 2, 2024

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

3,984 215 Updated Oct 1, 2024

TinySTL is a subset of STL(cut some containers and algorithms) and also a superset of STL(add some other containers and algorithms)

C++ 2,283 626 Updated Oct 27, 2018

从入门到精通,该项目力求做到最清晰、最系统的中文Prompt指北

3 Updated Aug 23, 2024

VideoSys: An easy and efficient system for video generation

Python 1,678 112 Updated Oct 3, 2024

Efficient Triton Kernels for LLM Training

Python 3,114 159 Updated Oct 3, 2024

PyQt5 implementation of YOLOv5 GUI

Python 1 Updated May 13, 2024

A curated list for Efficient Large Language Models

Python 1,151 83 Updated Oct 2, 2024

SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.

Python 969 91 Updated Sep 22, 2024

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 686 34 Updated Sep 19, 2024

本项目旨在分享大模型相关技术原理以及实战经验。

HTML 9,409 918 Updated Sep 22, 2024

🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)

Python 8 1 Updated Oct 3, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 34,965 4,060 Updated Oct 3, 2024

计算机自学指南

HTML 56,294 6,780 Updated Sep 13, 2024

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,403 194 Updated Sep 30, 2024

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 171 13 Updated Jun 18, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,579 174 Updated Oct 3, 2024

4 bits quantization of LLaMA using GPTQ

Python 2,986 459 Updated Jul 13, 2024

Development repository for the Triton language and compiler

C++ 12,916 1,567 Updated Oct 3, 2024

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

3,471 143 Updated Sep 25, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,318 933 Updated Oct 1, 2024

Universal LLM Deployment Engine with ML Compilation

Python 18,796 1,535 Updated Oct 2, 2024

Inference Llama 2 in one file of pure C

C 17,244 2,056 Updated Aug 6, 2024

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 271 64 Updated Sep 8, 2024

Mamba SSM architecture

Python 12,721 1,070 Updated Sep 26, 2024

Making large AI models cheaper, faster and more accessible

Python 38,690 4,337 Updated Sep 30, 2024

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Cuda 1,223 132 Updated Oct 2, 2024

《Effective Modern C++》- 完成翻译

7,710 1,162 Updated Aug 19, 2024
Next