rmsnorm

Here are 7 public repositories matching this topic...

DefTruth / CUDA-Learn-Notes

📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.

cuda pytorch triton gemm softmax cuda-programming layernorm gemv elementwise rmsnorm flash-attention flash-attention-2 warp-reduce block-reduce flash-attention-3

Updated Nov 19, 2024
Cuda

bzhangGo / rmsnorm

Star

Root Mean Square Layer Normalization

layernorm rmsnorm

Updated Mar 28, 2023
Python

knotgrass / Griffin

Star

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

h3 linear attention language-model griffin mamba gelu conv1d rmsnorm rg-lru shift-ssm

Updated May 26, 2024
Python

dtunai / Tri-RMSNorm

Star

Efficient kernel for RMS normalization with fused operations, includes both forward and backward passes, compatibility with PyTorch.

machine-learning ai triton rmsnorm

Updated Jun 5, 2024
Python

Simple and easy to understand PyTorch implementation of Large Language Model (LLM) GPT and LLAMA from scratch with detailed steps. Implemented: Byte-Pair Tokenizer, Rotational Positional Embedding (RoPe), SwishGLU, RMSNorm, Mixture of Experts (MOE). Tested on Taylor Swift song lyrics dataset.

moe mixture-of-experts kv-cache llm rmsnorm swiglu pytorch-llm byte-pair-tokenizer rotational-positional-embedding