Last Week in a Byte | 2023-07-18
This is the first part of a 2-part series recapping all of the Delta Lake sessions at this year's Data+AI Summit.

Last Week in a Byte | 2023-07-18

You can watch or read the latest #deltalake news, a week late (2023-07-18 edition)!

In this 2-part series, we'll recap all of the great Delta Lake sessions at this year's Data+AI Summit in San Francisco, CA.


Why Delta Lake is the Best Storage Format for Pandas Analyses

Matthew Powers, CFA kicked off the breakout sessions with an 𝙖𝙬𝙚𝙨𝙤𝙢𝙚 talk on why Delta Lake is the fastest storage format for Pandas analyses. Matt interweaves the great features in Delta Lake while performing common data wrangling tasks directly in Pandas. The high-level recap at the end was an added bonus on how the Lakehouse architecture allows different teams to interoperate using the tools they 𝘭𝘰𝘷𝘦.


Is Rust the Future of Analytics?

 Next, Oz Katz , CTO and co-founder at lakeFS delivered an insightful talk on the future of Data Analytics. In the talk, Oz points out some of the shortcomings in popular data analytics languages, like Java and Python, and how Rust bridges those gaps. Oz also points out the recent innovations in the Rust data ecosystem and how Rust is a great fit for compiling to web assembly resulting in just a tiny binary that can execute directly in your browser!


Rapidly Implementing Major Retailer API at the Hershey Company

One of my favorite sessions comes from Simon Whiteley and Zach Stagers from Advancing Analytics , who partnered with the Jordan Donmoyer and @Chad Deremus at The Hershey Company to build a major retailer API. The team describes their year-long journey building the Hershey Commercial Data Store which delivers real-time data and actionable insights. At the heart of their data pipeline is #deltalake, which easily scales to their billion row tables and powers a variety of reports and dashboards for their SQL analysts.


Scaling Deep Learning Using Delta Lake

Michael Shtelma presented a great talk on how to scale deep learning pipelines

using Delta Lake and the Python library DeltaTorch. Michael explains how DeltaTorch is really small library based upon the delta-rs implementation which utilizes Pyarrow to read the data files during distributed training. Even more impressive was Michael's benchmark, which showed how DeltaTorch was almost 𝟮𝘅 𝗳𝗮𝘀𝘁𝗲𝗿 at distributed training than Petastorm!


Building Data Sharing Apps Using Node.js

In case you missed it, Will Girten presented a short lightning talk on building data sharing applications using the Node.js connector for Delta Sharing. In his talk, Will looks back at the last decade to see how JavaScript has remained a top programming language for repos created on GitHub. Lastly, Will demonstrates how the Node.js connector for Delta Sharing can simplify frontend and API architectures by removing the need to copy data into key-value stores.


The Next Part of this 2-Part Series

Stay tuned for the second part of this series recap!


Connect with us!

Want to learn more about Delta Lake and chat with other users and contributors? Join us at delta.ioSlackLinkedIn, and GitHub

KRISHNAN N NARAYANAN

Sales Associate at American Airlines

1y

Great opportunity

Like
Reply

Special shoutout to Will Girten for hosting this week's LWIAB! 😁 Subscribe to the Last Week in a Byte newsletter here: https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7005016077193687040

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics