Software Huddle cover image

Software Huddle

Fast Inference with Hassan El Mghari

Apr 8, 2025
Hassan El Mghari, an AI expert from Together AI, dives into the exciting world of inference optimization. He discusses the rapid growth of Together AI and its hefty series B funding. Listeners will learn about customer applications of AI, the challenges and best practices in building AI apps, and the importance of speed in inference engines. Hassan also explores model fine-tuning techniques, serverless architectures, and common pitfalls in AI app development. This episode is a treasure trove for anyone interested in cutting-edge AI innovations!
53:06

Podcast summary created with Snipd AI

Quick takeaways

  • Together AI addresses the accessibility of AI computing resources, allowing users to overcome challenges associated with implementing open source models effectively.
  • Fine-tuning AI models requires high-quality data and technical expertise, enabling businesses to customize them for improved performance in specific applications.

Deep dives

Challenges of Open Source Models

Using open source models presents several challenges that users often face. Expertise is crucial for running these models on GPUs, as users must navigate through numerous LLM serving frameworks like VLLM or TLLM. Ensuring compatibility with specific models and architectures, setting them up on GPUs, and conducting rigorous testing adds to the complexity. Many users may find themselves overwhelmed by these requirements, which can lead to frustration and hinder their ability to effectively utilize these powerful tools.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner