
The Future of Dev Experience: Spotify’s Playbook for Organization‑Scale AI
AI Engineering Podcast
00:00
Risks: correctness and cost controls
Niklas stresses human judgment in the loop, validation of generated SQL, and behind-the-scenes cost guardrails.
Play episode from 37:48
Transcript
Transcript
Episode notes
Summary
In this episode of the AI Engineering Podcast Niklas Gustavsson, Chief Architect at Spotify, talks about scaling AI across engineering and product. He explores how Spotify's highly distributed architecture was built to support rapid adoption of coding agents like Copilot, Cursor, and Claude Code, enabled by standardization and Backstage. The conversation covers the tension between bottoms-up experimentation and platform standardization, and how Spotify is moving toward monorepos and fleet management. Niklas discusses the emergence of "fleet-wide agents" that can execute complex code changes with robust testing and LLM-as-judge loops to ensure quality. He also touches on the shift in engineering workflows as code generation accelerates, the growing use of agents beyond coding, and the lessons learned in sandboxing, agent skills/rules, and shared evaluation frameworks. Niklas highlights Spotify's decade-long experience with ML product work and shares his vision for deeper end-to-end integration of agentic capabilities across the full product lifecycle and making collaborative "team-level memory" for agents a reality.
Announcements
Interview
Contact Info
Parting Question
Closing Announcements
Links
The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
In this episode of the AI Engineering Podcast Niklas Gustavsson, Chief Architect at Spotify, talks about scaling AI across engineering and product. He explores how Spotify's highly distributed architecture was built to support rapid adoption of coding agents like Copilot, Cursor, and Claude Code, enabled by standardization and Backstage. The conversation covers the tension between bottoms-up experimentation and platform standardization, and how Spotify is moving toward monorepos and fleet management. Niklas discusses the emergence of "fleet-wide agents" that can execute complex code changes with robust testing and LLM-as-judge loops to ensure quality. He also touches on the shift in engineering workflows as code generation accelerates, the growing use of agents beyond coding, and the lessons learned in sandboxing, agent skills/rules, and shared evaluation frameworks. Niklas highlights Spotify's decade-long experience with ML product work and shares his vision for deeper end-to-end integration of agentic capabilities across the full product lifecycle and making collaborative "team-level memory" for agents a reality.
Announcements
- Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems
- Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.
- Your host is Tobias Macey and today I'm interviewing Niklas Gustavsson about how Spotify is scaling AI usage in engineering and product work
Interview
- Introduction
- How did you get involved in machine learning?
- Can you start by giving an overview of your engineering practices independent of AI?
- What was your process for introducing AI into the developmer experience? (e.g. pioneers doing early work (bottom-up) vs. top-down)
- There are countless agentic coding tools on the market now. How do you balance organizational standardization vs. exploration?
- Beyond the toolchain, what are your methods for sharing best practices and upskilling engineers on use of agentic toolchains for software/product engineering?
- Spotify has been operationalizing ML/AI features since before the introduction of LLMs and transformer models. How has that history helped inform your adoption of generative AI in your overall engineering organization?
- As you use these generative and agentic AI utilities in your day-to-day, how have those lessons learned fed back into your AI-powered product features?
- What are some of the platform capabilities/developer experience investments that you have made to improve the overall effectiveness of agentic coding in your engineering organization?
- What are some examples of guardrails/speedbumps that you have introduced to avoid injecting unreliable or untested work into production?
- As the (time/money/cognitive) cost of writing code drops that increases the burden on reviewing that code. What are some of the ways that you are working to scale that side of the equation?
- What are some of the ways that agentic coding/CLI utilities have bled into other areas of engineering/opertions/product development beyond just writing code?
- What are the most interesting, innovative, or unexpected ways that you have seen your team applying AI/agentic engineering practices?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on operationalizing and scaling agentic engineering patterns in your teams?
- When is agentic code generation the wrong choice?
- What do you have planned for the future of AI and agentic coding patterns and practices in your organization?
Contact Info
Parting Question
- From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?
Closing Announcements
- Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers.
Links
- Spotify
- Developer Experience
- LLM == Large Language Model
- Transformers
- BackStage
- GitHub Copilot
- Cursor
- Claude Skills
- Monorepo
- MCP == Model Context Protocol
- Claude Code
- Product Manager
- DORA Metrics
- Type Annotations
- BigQuery
- PRD == Product Requirements Document
- AI Evals
- LLM-as-a-Judge
- Agentic Memory
The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
The AI-powered Podcast Player
Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!


