-
BUAA
- Beijing China
-
22:29
(UTC +08:00) - @Lick
Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Stars
🥇 A curated list of awesome large language models in finance(FinLLMs), including papers,models,datasets and codebases. 金融大模型列表,特别是中英双语大模型。
Modeling, training, eval, and inference code for OLMo
Learning material for CMU10-714: Deep Learning System
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.
TinySTL is a subset of STL(cut some containers and algorithms) and also a superset of STL(add some other containers and algorithms)
VideoSys: An easy and efficient system for video generation
Efficient Triton Kernels for LLM Training
A curated list for Efficient Large Language Models
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
flash attention tutorial written in python, triton, cuda, cutlass
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
4 bits quantization of LLaMA using GPTQ
Development repository for the Triton language and compiler
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Universal LLM Deployment Engine with ML Compilation
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Making large AI models cheaper, faster and more accessible
🎉 Modern CUDA Learn Notes with PyTorch: CUDA Cores, Tensor Cores, fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, hgemm, sgemv, warp/block reduce, elementwise, softmax, layernorm, rmsnorm.