Maxime Beauchemin

Founder & CEO of Preset and original creator of Apache Airflow and Apache Superset. Advises Tecton on ML infrastructure.

Top 5 podcasts with Maxime Beauchemin

Ranked by the Snipd community

Dec 8, 2024 • 52min

An Exploration Of The Impediments To Reusable Data Pipelines

Max Beauchemin, a data engineer with two decades of experience and founder of Preset, dives into the complexities of reusable data pipelines. He discusses the "write everything twice" problem, emphasizing the need for collaboration and shared reference implementations. Max explores the challenges of managing diverse SQL dialects and the evolving role of data engineers, likening it to front-end development. He envisions generative AI aiding knowledge distribution and encourages the community to engage in sharing templates to drive innovation in the field.

Aug 28, 2022 • 1h 10min

Alumni Of AirBnB's Early Years Reflect On What They Learned About Building Data Driven Organizations

Summary AirBnB pioneered a number of the organizational practices that have become the goal of modern data teams. Out of that culture a number of successful businesses were created to provide the tools and methods to a broader audience. In this episode several almuni of AirBnB’s formative years who have gone on to found their own companies join the show to reflect on their shared successes, missed opportunities, and lessons learned. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Data stacks are becoming more and more complex. This brings infinite possibilities for data pipelines to break and a host of other issues, severely deteriorating the quality of the data and causing teams to lose trust. Sifflet solves this problem by acting as an overseeing layer to the data stack – observing data and ensuring it’s reliable from ingestion all the way to consumption. Whether the data is in transit or at rest, Sifflet can detect data quality anomalies, assess business impact, identify the root cause, and alert data teams’ on their preferred channels. All thanks to 50+ quality checks, extensive column-level lineage, and 20+ connectors across the Data Stack. In addition, data discovery is made easy through Sifflet’s information-rich data catalog with a powerful search engine and real-time health statuses. Listeners of the podcast will get $2000 to use as platform credits when signing up to use Sifflet. Sifflet also offers a 2-week free trial. Find out more at dataengineeringpodcast.com/sifflet today! The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it. Select Star’s data discovery platform solves that out of the box, with an automated catalog that includes lineage from where the data originated, all the way to which dashboards rely on it and who is viewing them every day. Just connect it to your database/data warehouse/data lakehouse/whatever you’re using and let them do the rest. Go to dataengineeringpodcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Lindsay Pettingill Chetan Sharma, Swaroop Jagadish, Maxime Beauchemin, and Nick Handel about the lessons that they learned in their time at AirBnB and how they are carrying that forward to their respective companies Interview Introduction How did you get involved in the area of data management? You all worked at AirBnB in similar time frames and then went on to found data-focused companies that are finding success in their respective categories. Do you consider it an outgrowth of the specific company culture/work involved or a curiosity of the moment in time for the data industry that led you each in that direction? What are the elements of AirBnB’s data culture that you feel were done right? What do you see as the critical decisions/inflection points in the company’s growth that led you down that path? Every journey has its detours and dead-ends. What are the mistakes that were made (individual and collective) that were most instructive for you? What about that experience and other experiences led you each to go our respective directions with data startups? Was your motivation to start a company addressing the work that you did at AirBnB due to the desire to build on existing success, or the need to fix a nagging frustration? What are the critical lessons for data teams that you are focused on teaching to engineers inside and outside your company? What are your predictions for the next 5 years of data? What are the most interesting, unexpected, or challenging lessons that you have learned while working on translating your experiences at AirBnB into successful products? Contact Info Lindsay LinkedIn @lpettingill on Twitter Chetan LinkedIn @chesharma87 on Twitter Maxime mistercrunch on GitHub LinkedIn @mistercrunch on Twitter Swaroop swaroopjagadish on GitHub LinkedIn @arudis on Twitter Nick LinkedIn @NicholasHandel on Twitter nhandel on GitHub Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers Links Iggy Eppo Podcast Episode Acryl Podcast Episode DataHub Preset Superset Podcast Episode Airflow Transform Podcast Episode Deutsche Bank Ubisoft BlackRock Kafka Pinot Stata R Knowledge-Repo Podcast.__init__ Episode AirBnB Almond Flour Cookie Recipe The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Jul 25, 2023 • 1h 14min

Treating Prompt Engineering More Like Code // Maxime Beauchemin // #167

MLOps Coffee Sessions #167 with Maxime Beauchemin, Treating Prompt Engineering More Like Code.// AbstractPromptimize is an innovative tool designed to scientifically evaluate the effectiveness of prompts. Discover the advantages of open-sourcing the tool and its relevance, drawing parallels with test suites in software engineering. Uncover the increasing interest in this domain and the necessity for transparent interactions with language models. Delve into the world of prompt optimization, deterministic evaluation, and the unique challenges in AI prompt engineering. // BioMaxime Beauchemin is the founder and CEO of Preset, a Series B startup supporting and commercializing the Apache Superset project. Max was the original creator of Apache Airflow and Apache Superset when he was at Airbnb. Max has over a decade of experience in data engineering at companies like Lyft, Airbnb, Facebook, and Ubisoft.// MLOps Jobs board jobs.mlops.community// MLOps Swag/Merchhttps://mlops-community.myshopify.com/// Related LinksMax's first MLOps Podcast episode: https://go.mlops.community/KBnOgNTest-Driven Prompt Engineering for LLMs with Promptimize blog: https://maximebeauchemin.medium.com/mastering-ai-powered-product-development-introducing-promptimize-for-test-driven-prompt-bffbbca91535https://maximebeauchemin.medium.com/mastering-ai-powered-product-development-Test-Driven Prompt Engineering for LLMs with Promptimize podcast: https://talkpython.fm/episodes/show/417/test-driven-prompt-engineering-for-llms-with-promptimizeTaming AI Product Development Through Test-driven Prompt Engineering // Maxime Beauchemin // LLMs in Production Conference lightning talk: https://home.mlops.community/home/videos/taming-ai-product-development-through-test-driven-prompt-engineering--------------- ✌️Connect With Us ✌️ -------------Join our Slack community: https://go.mlops.community/slackFollow us on Twitter: @mlopscommunitySign up for the next meetup: https://go.mlops.community/registerCatch all episodes, blogs, newsletters, and more: https://mlops.community/Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/Connect with Max on LinkedIn: https://www.linkedin.com/in/maximebeauchemin/Timestamps:[00:00] Max introduces the Apache Superset project at Preset[01:04] Max's preferred coffee[01:16] Airflow creator[01:45] Takeaways[03:53] Please like, share, and subscribe to our MLOps channels![04:31] Check Max's first MLOps Podcast episode[05:20] Promptimize[06:10] Interaction with API[08:27] Deterministic evaluation of SQL queries and AI[12:40] Figuring out the right edge cases[14:17] Reaction with Vector Database[15:55] Promptomize Test Suite[18:48] Promptimize vision[20:47] The open-source blood[23:04] Impact of open source[23:18] Dangers of open source[25:25] AI-Language Models Revolution[27:36] Test-driven design[29:46] Prompt tracking[33:41] Building Test Suites as Assets[36:49] Adding new prompt cases to new capabilities[39:32] Monitoring speed and cost[44:07] Creating own benchmarks[46:19] AI feature adding more value to the end users[49:39] Perceived value of the feature[50:53] LLMs costs[52:15] Specialized model versus Generalized model[56:58] Fine-tuning LLMs use cases[1:02:30] Classic Engineer's Dilemma[1:03:46] Build exciting tech that's available[1:05:02] Catastrophic forgetting[1:10:28] Prompt-driven development[1:13:23] Wrap up

Nov 25, 2025 • 53min

Ep. #3, Building Tools That Shape Data with Maxime Beauchemin

In this engaging conversation, guest Maxime Beauchemin, the creator of Apache Airflow and Superset, shares his journey from building data warehouses at Ubisoft to revolutionizing data tooling. He reveals the inspiration behind Airflow, the challenges of scalable BI, and his mission with Preset to disrupt conventional analytics. Maxime discusses the implications of AI on data practices, the future of data roles, and the importance of open-source governance in fostering healthy data teams. A treasure trove of insights for data enthusiasts!

Nov 30, 2020 • 56min

Feature Stores for Accelerating AI Development - #432

In this discussion, Kevin Stumpf, co-founder and CTO of Tecton; Willem Pienaar, engineering lead at Gojek and Feast Project founder; and Maxime Beauchemin, founder of Preset and creator of Apache Airflow, dive deep into feature stores. They explore how feature stores can accelerate AI development, streamline data management, and address operational challenges. The conversation highlights the evolution of these stores, their importance in automating workflows, and the collaboration needed between data engineers and scientists to maximize efficiency.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner