Software Huddle

Fast Inference with Hassan El Mghari

24 snips
Apr 8, 2025
Hassan El Mghari, an AI expert from Together AI, dives into the exciting world of inference optimization. He discusses the rapid growth of Together AI and its hefty series B funding. Listeners will learn about customer applications of AI, the challenges and best practices in building AI apps, and the importance of speed in inference engines. Hassan also explores model fine-tuning techniques, serverless architectures, and common pitfalls in AI app development. This episode is a treasure trove for anyone interested in cutting-edge AI innovations!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Open-Source Model Challenges

  • Open-source models require expertise and familiarity with LLM serving frameworks like VLLM or TLLM.
  • Users must ensure framework compatibility with their model, architecture, and GPU, followed by testing.
ADVICE

Importance of Inference Speed

  • Prioritize speed for inference in AI applications, as it directly impacts user experience and cost-effectiveness.
  • Together AI achieves speed through a custom inference stack, kernel optimization, and speculative decoding.
INSIGHT

Fine-Tuning Challenges and Vision

  • Fine-tuning is less common than inference due to the higher barrier to entry, requiring high-quality, labeled data.
  • Together AI aims to simplify fine-tuning by automating the process and lowering the data requirements.
Get the Snipd Podcast app to discover more snips from this episode
Get the app