Open||Source||Data

Charna Parkey

What can we learn from ai-native development through stimulating conversations with developers, regulators, academics and people like you that drive forward development, seek to understand impact, and are working to mitigate risk in this new world?

Join Charna Parkey and the community shaping the future of open source data, open source software, data in AI, and much more.

Episodes

Mentioned books

Jan 4, 2023 • 34min

Workflow Engines and Building a Domain Specific Language for Data Quality with Tom Baeyens

This episode features an interview with Tom Baeyens, Co-founder and CTO of Soda, where he oversees the company's product development, software architecture, and technology strategy. He is passionate about open source and committed to building a community where data engineers can succeed using the Soda Data Monitoring Platform. Tom is the inventor of the widely-used open source project jBPM and Activiti. He also co-founded Effektif, a cloud process automation company.In this episode, Sam and Tom discuss the evolution of open source workflow engines, data contracts, and why data quality needs a language approach.-------------------“Where we're heading is what I think is exactly the same as with software engineering in the testing. Test-driven development was a radical new thing back then. But then it turns out, you can much more reliably release software. And this is exactly the same here. If you don't inject data testing, data observability throughout your data stack, then how are you going to trust the data that you put into your machine learning model? This is something that people are realizing, but we're still figuring out the best practices, the dos, the don'ts. We've come a long way, but there's still a way to go before this is as common and as normal as in the test-driven development software engineering space.” - Tom Baeyens-------------------Episode Timestamps:(01:23): What open source data means to Tom(04:34): Tom’s motivations for creating jBPM(09:39): What led Tom to building Soda(13:57): Why data quality needs a language approach(19:24): The community of Soda(22:47): The future of Soda as a technology(24:59): A question Tom wishes to be asked(30:24): Tom’s advice for engineers who want to leverage data observability tools-------------------Links:LinkedIn - Connect with TomTwitter - Follow TomVisit SodaCL

Dec 14, 2022 • 44min

Enabling Edge Workers, AI & ML, and The Future of Data Science with Matthew Rocklin

This episode features an interview with Matthew Rocklin, CEO of Coiled, the scalable Dask-based cloud platform. Prior to founding Coiled, Matthew worked on Dask at Anaconda and then NVIDIA where his teams focused on accelerating Dask through parallel computing and GPUs. Matthew is an industry speaker, author, and founding member of Pangeo, whose mission is to develop open source analysis tools for ocean, atmosphere, and climate science.In this episode, Sam sits down with Matthew to discuss enabling edge workers, the future of data science, and the revolution of AI and ML.-------------------“There's all sorts of fun people using these tools and that's the most fun part of this job. You get to learn so much about so many different applications that are all so different and all so fascinating. You were thinking about all these different tools and technologies and I was talking to someone once, it's like, ‘Oh, it's like you're standing on the shoulders of giants.’ That's not quite right. There's lots of sort of normal size people all standing on each other's shoulders in like a massive pyramid. [...] Dask was designed to scale up an existing ecosystem. There's a legacy Python ecosystem that’ll provide a layer of parallel computing on top of it. You can do that either by rewriting the whole thing, which is not feasible, or you can do it by talking to lots of people and getting them to integrate in interesting, fun ways. That's actually been the fun parts of Dask. I think I've probably talked to every major maintainer group ever. I have worked with them to find out the ways to get everything to work smoothly together. And that's super fun. There's an interesting sort of technical and social hacking that occurs, which I think Python has done pretty well at, historically. Which is why it has success.” – Matthew Rocklin-------------------Episode Timestamps:(00:58): What open source data means to Matthew(03:29): Matthew’s motivations behind Python(18:58): How Matthew is enabling edge workers (34:46): What the future of data Python space looks like(39:29): Matthew’s advice for the technical data audience(41:36): Executive producer, Audra Montenegro's backstage takeaways-------------------Links:LinkedIn - Connect with MatthewTwitter - Follow MatthewVisit Matthew’s WebsiteVisit DaskDask ExamplesVisit CoiledSciPy Mission

Dec 7, 2022 • 35min

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Open||Source||Data

Episodes

Mentioned books

Workflow Engines and Building a Domain Specific Language for Data Quality with Tom Baeyens

Enabling Edge Workers, AI & ML, and The Future of Data Science with Matthew Rocklin

OSPOs, Measuring Community Success, and Self Knowledge with Nithya Ruff

IoT Databases, Digital Twins, and Real Holodecks with Jonathan Beri

Healthcare Infrastructure, ALS Research and Reliable Data with Indu Navar

Shifting Left on Data with DeVaris Brown, Tomer Shiran, and Erica Brescia

Serial Entrepreneurship, Metadata Capture Systems, and Osquery with Tony Gauda

Code Intelligence, GraphQL, and Closing the Remediation Gap with Beyang Liu

Stream Processing, Observability, and the User Experience with Eric Sammer

Season 3 Compressed Edition with Sam and Audra

The AI-powered Podcast Player