linux-foundation

Delta Lake Blogs

99 posts

Delta Lake on S3

by Avril Aysha,



Learn how to use Delta Lake on S3

Delta Lake for ETL

by Avril Aysha,



Learn how to use Delta Lake for ETL workloads

Delta Lake 4.0 Preview

by Tathagata “TD” Das, Allison Portis, Scott Sandre, Susan Pierce, Carly Akerly,



We are pleased to announce the preview release of Delta Lake 4.0 (release notes) on Apache Spark™ 4.0 Preview.

Delta Lake Optimize

by Avril Aysha,



Learn how to optimize your Delta Lake tables

Unlocking the Power of Delta Lake 3.0+: Introducing the New StarTree Connector with Delta Kernel

by Vibhuti Bhushan,



In the rapidly evolving landscape of data management, staying up-to-date with the latest advancements is key to maintaining a competitive edge.

Unifying the open table formats with Delta Lake Universal Format (UniForm) and Apache XTable

by Jonathan Brito, Kyle Weller,



Delta Lake Universal Format (UniForm) enables Delta tables to be read by any engine that supports Delta, Iceberg, and now, through code contributed by Apache XTable, Hudi.

Delta Kernel - Building Delta Lake connectors, made simple

by Nick Lanham, Tathagata “TD” Das,



Delta Lake recently hit an impressive milestone of being downloaded more than 20M times per month!

Query Delta Lake natively using BigQuery

by Gaurav Saxena, Justin Levandoski,



Users working with Delta Lake tables can now easily integrate their workloads with BigQuery, ensuring secure and more managed interoperability.

A Guide to Delta Lake Sessions at Data+AI Summit

by Carly Akerly,



The Data+AI Summit returns to San Francisco from June 10-13, 2024.

Delta Lake without Spark

by Avril Aysha,



Learn how to use Delta Lake without Spark

Use Delta Lake from Jupyter Notebook

by Avril Aysha,



Learn how to use Delta Lake from a Jupyter Notebook

Scaling Graph Data Processing with Delta Lake: Lessons from a Real-World Use Case

by Yeshwanth Vijayakumar, Director of Engineering, Adobe,



The Adobe Experience Platform includes a set of analytics, social, advertising, media optimization, targeting, Web experience management, journey orchestration, and content management products.

Delta Lake vs Data Lake - What's the Difference?

by Avril Aysha,



Understand the difference between Delta Lake and a data lake

Delta Lake 3.2

by Carly Akerly,



We are pleased to announce the release of Delta Lake 3.2 (release notes) on Apache Spark 3.5, with features that improve the performance and interoperability of Delta Lake.

Efficient Delta Vacuum with File Inventory

by Arun Ravi M V (Grab),



Today, Delta Lake is rapidly making its mark as a highly popular hybrid data format, earning widespread adoption across various organizations.

Rivian expands the Delta Lake ecosystem with Delta-Go

by Chelsea Jones, Staff Data Engineer, Rivian; Rahul Madnawat, Software Engineer II, Rivian; Jason Shiverick, Director of AI Platforms, Rivian,



Real-time data ingestion for high-volume transactions, now available in open source

Pros and cons of Hive-style partitioning

by Matthew Powers, Martin Bode,



This post discusses the pros and cons of Hive-style partioning.

Structured Spark Streaming with Delta Lake: A Comprehensive Guide

by Delta Lake,



The webinar demonstrates how to embrace structured streaming seamlessly from data emission to your final Delta table destination.

High-Performance Querying on Massive Delta Lake Tables with Daft

by Clark Zinzow, Jay Chia,



This post introduces the distributed + parallel Delta Lake reader in Daft.

Delta Lake - State of the Project - Part 2

by Tathagata "TD" Das, Susan Pierce, Carly Akerly,



Delta Lake, a project hosted under The Linux Foundation, has been growing by leaps and bounds. To celebrate the achievements of the project, we’re publishing a 2-part series on Delta Lake.

Delta Lake Announces Pandas Enhancement: Real Pandas to Optimize Data Lakehouse Performance

by Carly Akerly,



The Delta Lake project is thrilled to announce its latest and most exciting collaboration with the Pandas community!

Delta Lake - State of the Project - Part 1

by Tathagata "TD" Das, Susan Pierce, Carly Akerly,



Delta Lake, a project hosted under The Linux Foundation, has been growing by leaps and bounds. To celebrate the achievements of the project, we’re publishing a 2-part series on Delta Lake.

Delta Lake 3.1.0

by Carly Akerly,



This post describes the exiting features in the Delta Lake 3.1.0 release

Delta Lake replaceWhere

by Matthew Powers,



Selectively overriding rows or partitions of a Delta Lake table with replaceWhere.

Delta Lake Performance

by Joe Harris,



This post shows explains why Delta Lake is fast and describes improvements to Delta Lake performance over time.

Writing a Kafka Stream to Delta Lake with Spark Structured Streaming

by Bo Gao, Matthew Powers,



This blog post explains how to write a Kafka stream to a Delta table with Spark Structured Streaming.

Using Delta Lake with AWS Glue

by Keerthi Josyula, Matthew Powers,



This post shows how to register Delta tables in the AWS Glue Data Catalog with the AWS Glue Crawler.

New features in the Python deltalake 0.12.0 release

by Ion Koutsouris,



This post explains the new features in the Python deltalake 0.12.0 release

Delta Lake 3.0.0

by Carly Akerly,



This post describes the exiting features in the Delta Lake 3.0.0 release

Delta Lake vs. Parquet Comparison

by Matthew Powers,



This post compares the stengths and weaknesses of Delta Lake vs Parquet.

Unlock Delta Lakes for PyTorch Training with DeltaTorch

by Daniel Liden, Michael Shtelma,



This post demonstrates how to create PyTorch DataLoaders using Delta tables as data sources for training deep learning models.

Introducing Delta Lake Table Features

by Nick Karpov,



This introduces Delta Lake Table Features, a discrete feature-based compatibility scheme that replaces the traditional integer protocol versioning for Delta Lake tables and clients.

Delta Lake Change Data Feed (CDF)

by Nick Karpov, Matthew Powers,



This blog shows how to enable and use the Delta Lake Change Data Feed.

Delta Lake’s transaction log protocol and its implementations

by Matthew Powers,



This blog explains the Delta Lake transaction log protocol and its various implementation.

Delta Lake Deletion Vectors

by Nick Karpov,



This blog introduces the new Deletion Vectors table feature for Delta Lake tables, and explains how Deletion Vectors speed up operations that modify existing data in your lakehouse.

Using Ibis with PySpark on Delta Lake tables

by Marlene Mhangami, Matthew Powers,



This post explains how to use Ibis to query Delta tables with PySpark

Delta Lake Z Order

by Matthew Powers,



This post explains how to use Delta Lake Z Order to make your queries run faster

Delta Lake 2.3.0 Released

by Allison Portis, Matthew Powers,



This post explains some of the key features in the Delta Lake 2.3.0 release

Open source self-hosted Delta Sharing server

by Shingo OKAWA,



This post explains Kotosiro Delta Sharing server basic instructions

How Delta Lake uses metadata to make certain aggregations much faster

by Matthew Powers, Scott Sandre,



This post explains Delta Lake performance optimizations that make some aggregations execute quicker

How to use Delta Lake generated columns

by Matthew Powers,



How to create Delta Lake tables with generated columns and the benefits of this feature

Introducing Support for Delta Lake Tables in AWS Lambda

by Nick Karpov,



How to use deltalake in AWS Lambda with AWS SDK for pandas

How to create and append to Delta Lake tables with pandas

by Matthew Powers,



This post explains how to create and append to Delta Lake tables with pandas

Running ML Workflows with Delta Lake and Ray

by Jim Hibbard,



This post explains how you can read Delta Lake with the Ray compute framework

How to Convert from CSV to Delta Lake

by Matthew Powers,



This post explains how to convert from a CSV data lake to Delta Lake, which offers much better features.

Getting started contributing to Delta Lake Spark

by Nick Karpov,



This post explains the full development loop with the Delta Lake Spark connector. You'll learn how to retrieve and navigate the codebase, make changes, and package and debug custom builds.

New features in the Python deltalake 0.7.0 release of delta-rs

by Will Jones, Matthew Powers,



This post explains the new features in the deltalake 0.7.0 release

Delta Lake Merge

by Nick Karpov,



This post shows how to use MERGE with Delta tables.

Delta Lake Schema Evolution

by Matthew Powers,



This post shows how to enable schema evolution in Delta tables and when this is a good option.

Delta Lake Time Travel

by Matthew Powers,



This post shows how to time travel between different versions of a Delta table.

Delta Lake Small File Compaction with OPTIMIZE

by Matthew Powers,



This post shows compact small files in Delta tables with OPTMIZE.

Adding and Deleting Partitions in Delta Lake tables

by Matthew Powers, Ryan Zhu,



This post shows add partitions and remove partitions from Delta Lake tables.

Remove old files with the Delta Lake Vacuum Command

by Matthew Powers, Nick Karpov,



This blog post explains how to remove files marked for deletion from storage with the Delta Lake Vacuum command.

Reading Delta Lake Tables into Polars DataFrames

by Matthew Powers, Chitral Verma,



This post shows how to read Delta Lake tables into Polars DataFrames.

Building a more efficient data infrastructure for machine learning with Open Source using Delta Lake, Amazon SageMaker, and EMR

by Vedant Jain, Denny Lee,



In this blog, we’ll explore how connecting Delta Lake, Amazon SageMaker Studio, and Amazon EMR can simplify the end-to-end workflow required to support data engineering and data science projects.

Data Sharing across Government Agencies using Delta Sharing

by Li Yu, Mubashir Kazia, Jon D. Ceanfaglione, Prabha Rajendran, Purushotam Shrestha, Shawn A. Benjamin,



This post shows how government agencies are sharing data with Delta Sharing.

How to Delete Rows from a Delta Lake Table

by Matthew Powers,



This post teaches you how to delete rows from a Delta Lake table and how the operation is implemented under the hood.

Delta Lake Constraints and Checks

by Matthew Powers,



This post shows how to add constraints to your Delta table to avoid certain types of values from getting appended.

Delta Lake Schema Enforcement

by Matthew Powers,



This post teaches you about schema enforcement in Delta Lake and why it's better than what's offered by data lakes

Why PySpark append and overwrite write operations are safer in Delta Lake than Parquet tables

by Matthew Powers,



This post shows you why PySpark overwrite operations are safer with Delta Lake and how the different save mode operations are implemented under the hood.

How to Create Delta Lake tables

by Matthew Powers,



This post shows you how to create Delta Lake tables with Python, SQL, and PySpark.

How to Version Your Data with pandas and Delta Lake

by Matthew Powers,



This post shows you how to version your pandas datasets and the benefits you'll enjoy with versioned data.

Sharing a Delta Table’s Change Data Feed with Delta Sharing 0.5.0

by Will Girten,



We are excited to announce the release of Delta Sharing 0.5.0.

How to Rollback a Delta Lake Table to a Previous Version with Restore

by Matthew Powers,



This post shows you how to rollback Delta Lake tables to previous versions with restore.

Converting from Parquet to Delta Lake

by Matthew Powers,



This post shows how to convert a Parquet table to a Delta Lake.

Why we migrated to a Data Lakehouse on Delta Lake for T-Mobile Data Science and Analytics Team

by Robert Thompson, Geoff Freeman,



In this post, we will discuss the how and why we migrated from databases and data lakes to a data lakehouse on Delta Lake. Our lakehouse architecture allows reading and writing of data without blocking and scales out linearly....

How to drop columns from a Delta Lake table

by Matthew Powers,



This post shows you two ways to drop columns from Delta Lake tables.

Apache Flink Source Connector for Delta Lake tables

by Krzysztof Chmielewski, Scott Sandre, Denny Lee,



We are excited to announce the release of Delta Connectors 0.5.0, which introduces the new Flink/Delta Source Connector on Apache Flink™ 1.13 that can read directly from Delta tables using Flink’s DataStream API.

Delta 2.0 - The Foundation of your Data Lakehouse is Open

by Tathagata Das, Denny Lee,



We are happy to announce the release of the Delta Lake 2.0 on Apache Spark™ 3.2! The significance of Delta Lake 2.0 is not just a number - though it is timed quite nicely with Delta Lake’s 3rd birthday....

Multi-cluster writes to Delta Lake Storage in S3

by Scott Sandre, Denny Lee, Mariusz Kryński (Samba TV),



While Delta Lake has supported concurrent reads from multiple clusters since its inception, there were limitations for multi-cluster writes specifically to Amazon S3. Note, this was not a limitation for Azure ADLSgen2 nor Google GCS, as S3 currently lacks...

Delta Lake 1.2 - More Speed, Efficiency and Extensibility Than Ever

by Venki Korukanti, Scott Sandre, Tathagata Das, Allison Portis, Denny Lee, Vini Jaiswal,



Introducing performance optimizations that will supercharge your data pipelines at any scale.

Writing to Delta Lake from Apache Flink

by Fabian Paul, Pawel Kubit, Scott Sandre, Tathagata Das, Denny Lee,



Learn more about how you can write from Apache Flink to Delta Lake about the latest release of the open-source project Delta Sharing and how it enables sharing on Google Cloud Storage, among other enhancements.

Extending Delta Sharing to Google Cloud Storage

by Will Girten, Shixiong Zhu,



Learn more about the latest release of the open-source project Delta Sharing and how it enables sharing on Google Cloud Storage, among other enhancements.

Delta Connectors 0.3.0 Released

by Allison Portis,



We are excited to announce the release of Delta Connectors 0.3.0.

Delta Lake 1.1.0 Released

by Scott Sandre,



We are excited to announce the release of Delta Lake 1.1.0.

Delta Sharing 0.3.0 Released

by Lin Zhou,



We are excited to announce the release of Delta Sharing 0.3.0.

Power BI Delta Sharing Connector

by Denny Lee,



We are excited about the recently announced preview of the Power BI Delta Sharing connector

Delta Lake User Survey (2021 H2)

by Denny Lee,



We would like to invite you to provide your feedback on Delta Lake OSS.

Delta Lake 1.0.0 Released

by Tathagata Das,



We are excited to announce the release of Delta Lake 1.0.0 on Apache Spark 3.1.

AMA: Growing the Delta Lake ecosystem

by Denny Lee,



On March 11th, 2021 9:00 am PT, join us for this fun Delta Lake AMA session where we discuss with QP Hou, Christian Williams, and Alexander Kushnir from Scribd on growing the Delta Lake open-source ecosystem.

Salesforce Engineering: Delta Lake Tech Talk Series

by Denny Lee,



We are happy to announce the Salesforce Engineering Delta Lake Tech Talk Series for March and April 2021.

Delta Lake 0.8.0 Released

by Denny Lee,



We are excited to announce the release of Delta Lake 0.8.0.

Salesforce Engineering: Delta Lake Blog Series

by Denny Lee,



Salesforce Engineering has published a series of blogs on how they use Delta Lake.

Salesforce Engineering: Global Synchronousness and Ordering in Delta Lake

by Denny Lee,



At Salesforce, we maintain a platform to capture customer activity — various kinds of sales events such as emails, meetings, and videos. These events are either consumed by downstream products in real time or stored in our data lake, which...

Getting Started with Delta Lake

by Denny Lee,



Want to learn more about Delta Lake? Check out this series of Delta Lake videos.

Delta Lake Sessions at Spark+AI Summit North America 2020

by Denny Lee,



We're really excited for the numerous Delta Lake training and conference sessions that will be showcased throughout Spark+AI Summit NA 2020.

Delta Lake 0.7.0 Released

by Denny Lee,



We are excited to announce the release of Delta Lake 0.7.0 on Apache Spark 3.0. This is the first release on Spark 3.x and adds support for metastore-defined tables and SQL DDLs.

Delta Lake 0.6.1 Released

by Denny Lee,



We are excited to announce the release of Delta Lake 0.6.1, which fixes a few critical bugs in merge operation and operation metrics. If you are using version 0.6.0, it is strongly recommended that you upgrade to version 0.6.1.

Delta Lake 0.6.0 Released

by Denny Lee,



We are excited to announce the release of Delta Lake 0.6.0, which introduces schema evolution and performance improvements in merge, and operation metrics in table history.

Delta Lake Newsletter: 2020-03-20 Edition

by Denny Lee,



For this edition of the Delta Lake Newsletter, find out more about the latest and upcoming tech talks and videos.

Diving into Delta Lake Online Tech Talk Series

by Denny Lee,



For our next series of Delta Lake online tech talks, we're excited to dive into the internals with our Diving into Delta Lake series. This will be a fun set of tech talks with live demos and Q&A. Check them...

Delta Lake Online Tech Talks

by Denny Lee,



We’re excited to announce the next series of Delta Lake online tech talks over the next few weeks. This will be a fun set of tech talks with live demos and Q&A. Check them out!

Delta Lake 0.5.0 Released

by Denny Lee,



We are excited to announce the release of Delta Lake 0.5.0, which introduces Presto/Athena support and improved concurrency.

Delta Lake Newsletter: 2019-10-03 Edition (incl. SAIS EU 2019 Sessions)

by Denny Lee,



This edition of the Delta Lake Newsletter, find out more about the latest and upcoming webinars, meetups, and publications. For this edition, we will also focus on the many sessions at Spark+AI Summit EU 2019 in Amsterdam.

Delta Lake 0.4.0 Released

by Denny Lee,



We are excited to announce the release of Delta Lake 0.4.0 which introduces Python APIs for manipulating and managing data in Delta tables.

Delta Lake 0.3.0 Released

by Denny Lee,



We are happy to announce the availability of Delta Lake 0.3.0! Features include: Scala Java APIs for DML commands, Scala/Java APIs for query commit history, and Scala/Java APIs for vacuuming old files.

Delta Lake 0.2.0 Released

by Denny Lee,



We are happy to announce the availability of Delta Lake 0.2.0! It brings support for cloud storage (e.g. Amazon S3 and Azure Blob Storage) and improved concurrency.

Delta Lake 0.1.0 Released

by Denny Lee,



We are happy to announce the availability of Delta Lake 0.1.0! Initial version of the open source Delta Lake.