Lists (1)
Sort Name ascending (A-Z)
Stars
A library for building fast, reliable and evolvable network services.
The gateway component to make Spark on K8s much easier for Spark users.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
An orchestration platform for the development, production, and observation of data assets.
The Linux Kernel Module Programming Guide (updated for 5.0+ kernels)
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Vitess is a database clustering system for horizontal scaling of MySQL.
Titus is the Netflix Container Management Platform that manages containers and provides integrations to the infrastructure ecosystem.
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
Cap'n Proto serialization/RPC system - core tools and C++ library
Protocol Buffers - Google's data interchange format
A Fast Key-Value Storage Engine Based on Hierarchical B+-Tree Trie
Notes talking about the design and implementation of Apache Spark
BTrace - a safe, dynamic tracing tool for the Java platform
Small set of tools for JVM troublshooting, monitoring and profiling.
Syntax highlighting for Google Docs
🎓 Path to a free self-taught education in Computer Science!
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
An Open Source unit test framework for Hive queries based on JUnit 4 and 5
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
The official AWS SDK for Java 1.x (In Maintenance Mode, End-of-Life on 12/31/2025). The AWS SDK for Java 2.x is available here: https://github.com/aws/aws-sdk-java-v2/
lakeFS - Data version control for your data lake | Git for data