Data Engineering Podcast cover image

Move Your Database To The Data And Speed Up Your Analytics With DuckDB

Data Engineering Podcast

00:00

The Hive Protocol Is Not a Good Idea, Maybe?

In order to pull this from spark to your python shell, spark, in its infinite wisdom, cyralizes the data using the hive protocol. The hive protocol uses thriftting herd colametater, which is not a good idea and means that you have to use mud plodly instead of spark. You know, i think that's more true than people want to believe, absolutely how much performance we leave on the table.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app