
Distributed Time Series in Machine Learning - ML 088
Adventures in Machine Learning
00:00
Python Data Generation Is a Lot Faster Than Using NumPy
Stick with NumPy and NumPy which converts simple loops to machine code I think or at least some lower level code you can get a lot more performance yeah. Then parallelize it using something like right but pandas will kill you with data generation another very useful tipYeah if you're doing an array manipulations in Pandas that's all stupidly optimized to that particular hardware that it's on so everything's faster. But when we're talking about extreme scale stuff or we're like hey we need to generate our level data over 30 year period and NumPy can generate that array in less than a second that's that's completely trivial. It's the difference between 17 minutes and 17 hours
Transcript
Play full episode