Data Engineering Podcast cover image

Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data

Data Engineering Podcast

00:00

The Impact of Spark on Aero on Engineering Productivity and Computer Efficiency

Aero project aims to make Python on Spark a lot faster. One of the initial motivations for aero was to cut down on some of the inefficiencies of that data interchange. By defining a column oriented data format, which could be constructed on the JVM side inside the Spark runtime and then sent over to the Python side,. we were able to make custom code running in PySpark 10 to 100 times faster in some cases.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app