Skip to content
View jeradf's full-sized avatar

Block or report jeradf

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Traditional U.S. electoral maps not only illustrate polarization, they can exacerbate it. No state is strictly red or blue, they are all shades of purple.

JavaScript 46 4 Updated Nov 9, 2024

NanoGPT (124M) in 5 minutes

Python 1,517 135 Updated Nov 25, 2024

A programming framework for agentic AI 🤖

Jupyter Notebook 34,804 5,037 Updated Nov 25, 2024

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

Python 812 52 Updated Oct 28, 2024

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

Python 2,586 469 Updated Nov 25, 2024

🤖 Build voice-based LLM agents. Modular + open source.

Python 2,936 494 Updated Nov 15, 2024

PaddleSlim is an open-source library for deep model compression and architecture search.

Python 1,564 345 Updated Nov 20, 2024

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

Python 609 58 Updated Mar 1, 2023

A voice chat app

Python 1,072 122 Updated Nov 15, 2024

A pair of tiny foundational models trained in Brazilian Portuguese.🦙🦙

Python 26 5 Updated Sep 27, 2024

[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Sequence Modeling

Python 61 4 Updated Apr 24, 2024
Jupyter Notebook 199 41 Updated May 10, 2024

An unnecessarily tiny implementation of GPT-2 in NumPy.

Python 3,254 417 Updated Apr 24, 2023

A safetensors extension to efficiently store sparse quantized tensors on disk

Python 51 2 Updated Nov 25, 2024

Official implementation of Half-Quadratic Quantization (HQQ)

Python 705 70 Updated Nov 22, 2024

Reorder-based post-training quantization for large language model

Python 181 11 Updated May 17, 2023

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 30,786 4,675 Updated Nov 25, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 35,582 4,139 Updated Nov 25, 2024

Awesome LLM compression research papers and tools.

1,212 80 Updated Nov 25, 2024

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,543 206 Updated Oct 16, 2024

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,175 178 Updated Nov 23, 2024

Code, dataset, and analysis samples that utilize the OpenFEMA API.

Jupyter Notebook 29 8 Updated Sep 17, 2024

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 734 60 Updated Nov 25, 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Python 226 17 Updated Oct 8, 2024

Low-bit LLM inference on CPU with lookup table

C++ 593 45 Updated Nov 19, 2024

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 732 56 Updated Oct 8, 2024

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,059 149 Updated Nov 25, 2024

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 1,947 155 Updated Mar 27, 2024

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Python 652 43 Updated Aug 13, 2024

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.

Python 881 106 Updated Oct 7, 2024
Next