Last Week in a Byte on Delta Lake | 2023-03-21

Delta Lake

Delta Lake is an open-source storage framework that enables building a Lakehouse architecture.

Published Mar 21, 2023

+ Follow

You can watch or read the latest #DeltaLake news a week late (2023-03-21 edition)!

Recent releases and contributions

We're proud to announce the release of delta-rs rust-v0.8.0 release which includes support for additional types of partition values, Implements pruning on partition columns, typed commit info, enables passing storage options to Delta table builder via Datafusion's CREATE EXTERNAL TABLE, and more! More information is available at https://github.com/delta-io/delta-rs/releases/tag/rust-v0.8.0. #rustlang
Gurunath Rajagopal recently released his Lakehouse Sharing project, which Demonstrates a table format agnostic data sharing server (based on delta-sharing protocol) implemented in Python for both #deltalake and #apacheiceberg formats. #deltasharing
Want to help with creating Delta Lake helper functions without Spark dependencies? Check out https://github.com/MrPowers/levi and chat with Matthew Powers, CFA, who created the levi, mack, and jodie Delta Lake helper function libraries.
Get the latest Delta table version using mack helper functions using mack.

delta_table = DeltaTable.forPath(spark, path
mack.latest_version(delta_table)
>> 2

Joydeep Banik Roy published CHANGE DATA FEED — Time Travel — Failure Scenarios, Prevention & Recovery, which covers three scenarios where time travel queries fail and how to check for these errors easily.

# import Jodie library
ChangeDataFeedHelper(deltaTablePath,0,25)
.dryRun()
.readCDF()

Upcoming events

We are happy to partner with Blueprint on their Velocity Tour to bring you demos, meet and greets, speaking sessions, and more! They will be at Data Council Austin 2023, PyCon US 2023 in Salt Lake City, and PyData Seattle 2023 in Seattle for March and April. Check out the Velocity Tour for all of their dates!

Latest community blogs

Robert Kossendey published the fourth blog in his insightful series on his journey to the #lakehouse with the post Lakehouse - A resumé.

Overall, we are more than satisfied with the outcome of our Lakehouse migration. We reduced our overall costs by 80% while improving our developer experience drastically. We don’t have to maintain a Redshift cluster anymore. Instead, we store all the data in a single place, S3. Further, the core of our infrastructure is powered by open source, namely Apache Spark and Delta Lake. That empowers us to move away from Databricks if we are ever unhappy with the service.

For more information, check out the vidcast D3L2: The Journey Unifying Data Lake and Data Warehouse with Robert Kossendey at Claimsforce. cc claimsforce

Connect with us!