Delta Lake without Spark

by Avril Aysha, May 29, 2024

This post shows you how to use Delta Lake without Spark.

You might want to use Delta Lake without Spark because:

You don’t want to learn Spark
Your team doesn’t use Spark
You don’t want to use the Java Virtual Machine (JVM)
You are working with relatively small datasets

You can use Delta Lake without Spark using many other languages, like SQL, Python, and Rust. This post will show you examples of the most popular ways of using Delta Lake without Spark.

Let’s jump in! 🪂

How to use Delta Lake without Spark

There are many ways to use Delta Lake without Spark.

Let’s group them into two categories for clarity:

dedicated Delta Connectors let you use Delta Lake from engines like Flink, Hive, Trino, PrestoDB, and many others
the delta-rs package lets you use Delta Lake in Rust or Python, e.g. with pandas, polars, Dask, Daft, DuckDB and many others

This post will show you a brief code example for each of these options to use Delta Lake without Spark. You can also find the full list of integrations on the Delta Lake website.

Delta Lake without Spark: Dedicated Connectors

Many non-Spark query engines have dedicated connectors to use Delta Lake. These are all based on Delta Standalone: a JVM library for Java / Scala that can be used to read from and write to Delta tables. You can use Delta Standalone to build your own Delta connector for services that are not listed on the Integrations page.

Note: if you want to avoid the JVM entirely, refer to the delta-rs section below

You can use Delta Lake without Spark with a dedicated Delta connector from:

Apache Flink
Apache Hive
PrestoDB
Trino
Amazon Athena
Snowflake
Google BigQuery
Microsoft Fabric

Some of these connectors support limited Delta Lake functionality. Make sure to check the “Known Limitations” section for each connector to learn more.

Delta Lake without Spark: Apache Flink

You can use the Flink/Delta connector to use Delta Lake from Apache Flink. The connector supports data writes in both batch and streaming mode.

The connector includes:

DeltaSink for writing data from Apache Flink to a Delta table.
DeltaSource for reading Delta tables using Apache Flink.

You can use Delta Lake with the Flink Python or SQL API.

Delta Lake with Flink: Python

The code below is an example of how you can write data to a partitioned table using one partitioning column surname.

Delta Lake without Spark

How to use Delta Lake without Spark

Delta Lake without Spark: Dedicated Connectors

Delta Lake without Spark: Apache Flink

Delta Lake with Flink: Python

Delta Lake with Flink: SQL

Known Limitations

Delta Lake without Spark: Apache Hive

Known Limitations

Delta Lake without Spark: PrestoDB

Known Limitations

Delta Lake without Spark: Trino

Writing to Cloud storage

Data Type Mapping

Delta Lake without Spark: Amazon Athena

Known Limitations

Delta Lake without Spark: Snowflake

Known Limitations

Delta Lake without Spark: Google BigQuery

Known Limitations

Delta Lake without Spark with delta-rs

Delta Lake without Spark: pandas

Delta Lake without Spark: Polars

Delta Lake without Spark: Dask

Delta Lake without Spark: Daft

Delta Lake without Spark: DuckDB

Delta Lake without Spark: Datafusion

Delta Lake without Spark: Conclusion

Follow our authors on