Data Engineering Podcast cover image

Move Your Database To The Data And Speed Up Your Analytics With DuckDB

Data Engineering Podcast

00:00

The Hive Protocol Is Not a Good Idea, Maybe?

In order to pull this from spark to your python shell, spark, in its infinite wisdom, cyralizes the data using the hive protocol. The hive protocol uses thriftting herd colametater, which is not a good idea and means that you have to use mud plodly instead of spark. You know, i think that's more true than people want to believe, absolutely how much performance we leave on the table.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app