The New Stack Podcast

How Fal.ai Went From Inference Optimization to Hosting Image and Video Models

Jul 25, 2025
Burkay Gur, Co-founder and CEO of Fal.ai, and Glenn Solomon, Managing Partner at Notable Capital, dive into the evolution of generative media. They discuss Fal.ai's shift from optimizing machine learning infrastructure to hosting diverse media models, emphasizing the competitive edge of speed. Gur highlights how generative AI transforms creativity, noting that while creation costs have dropped, true creative value remains. Solomon reflects on the changing landscape of venture capital in AI, focusing on model customization and the balance between technology and creativity.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Fal.ai's Strategic Pivot to Media Models

  • Fal.ai evolved from optimizing Python runtimes to focusing on generative media models like images and videos due to generative AI boom.
  • The key market shift was that inference requires GPU power, creating large demand for faster media model hosting.
ADVICE

Workload-Focused Model Optimization

  • Focus engineering efforts on profiling specific workloads to find targeted optimization opportunities.
  • Combine existing open-source kernels and incremental improvements rather than building all optimization frameworks from scratch.
INSIGHT

Full-Stack Speed Optimization

  • To achieve speed, Fal.ai optimizes across the full request lifecycle, including regional routing to GPUs and multi-cloud content delivery systems.
  • This multi-layer approach requires diverse skills in cloud engineering, low-level systems, and GPU optimization collaborating tightly.
Get the Snipd Podcast app to discover more snips from this episode
Get the app