-
Open Source Big Data Dev
- San Francisco, CA, USA
- http://www.holdenkarau.com/resume.pdf?q=github
- @holdenkarau
Stars
Let's RAG it RAW without fancy frameworks
A collection of learning resources for curious software engineers
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
pyspark methods to enhance developer productivity 📣 👯 🎉
A Python Library to support running data quality rules while the spark job is running⚡
A tool to validate data, built around Apache Spark.
8-bit CUDA functions for PyTorch, modified to build on Jetson Xavier
A modular implementation of timely dataflow in Rust
State of the Art Natural Language Processing
Your self-hosted, globally interconnected microblogging community
A POC for multilingual UDFs in KSQL
TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
Prototype implementation of Service-Level Fault Injection Testing in Python.
Replaces the factory firmware on the SwitchBot Plug Mini via OTA, enabling the use of Tasmota without disassembling the unit.
lakeFS - Data version control for your data lake | Git for data
Java imap nio client that is designed to scale well for thousands of connections per machine and reduce contention when using large number of threads and cpus.
Inofficial Qualcomm Firehose / Sahara / Streaming / Diag Tools :)
Reverse Engineering Furby Connect's Bluetooth Protocol and Update Format
Open source version of Arrow Connect Platform developed by Arrow Electronics
A PowerDNS pipe dynamic backend to serve dnswall style A, AAAA and PTR DNS records for any given CIDR ranges.
Main repository for the Howlr application
GitHub Action to build and deploy a Jekyll site to GitHub Pages 🧪