The Hive Protocol Is Not a Good Idea, Maybe?

In order to pull this from spark to your python shell, spark, in its infinite wisdom, cyralizes the data using the hive protocol. The hive protocol uses thriftting herd colametater, which is not a good idea and means that you have to use mud plodly instead of spark. You know, i think that's more true than people want to believe, absolutely how much performance we leave on the table.

Play episode from 01:05:38

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Move Your Database To The Data And Speed Up Your Analytics With DuckDB

Data Engineering Podcast

The Hive Protocol Is Not a Good Idea, Maybe?

Summary

Announcements

Interview

Contact Info

Parting Question

Links

The AI-powered Podcast Player