Hassan El Mghari, an AI expert from Together AI, dives into the exciting world of inference optimization. He discusses the rapid growth of Together AI and its hefty series B funding. Listeners will learn about customer applications of AI, the challenges and best practices in building AI apps, and the importance of speed in inference engines. Hassan also explores model fine-tuning techniques, serverless architectures, and common pitfalls in AI app development. This episode is a treasure trove for anyone interested in cutting-edge AI innovations!
Together AI addresses the accessibility of AI computing resources, allowing users to overcome challenges associated with implementing open source models effectively.
Fine-tuning AI models requires high-quality data and technical expertise, enabling businesses to customize them for improved performance in specific applications.
The emphasis on optimizing inference speed through a custom-built stack is crucial for enhancing user satisfaction and ensuring reliable AI-driven operations.
Deep dives
Challenges of Open Source Models
Using open source models presents several challenges that users often face. Expertise is crucial for running these models on GPUs, as users must navigate through numerous LLM serving frameworks like VLLM or TLLM. Ensuring compatibility with specific models and architectures, setting them up on GPUs, and conducting rigorous testing adds to the complexity. Many users may find themselves overwhelmed by these requirements, which can lead to frustration and hinder their ability to effectively utilize these powerful tools.
Evolution and Vision of Together AI
Together AI began with a foundation in crypto, leveraging excess compute resources from GPU owners to tap into an evolving market. The company identified a critical need for more accessible AI computing resources, particularly in the context of open source AI, where users often struggle with implementation. Since its inception, Together AI has rapidly expanded, particularly following a substantial Series B funding round, which highlights the growing demand for its services. This expansion not only reflects increased customer needs but also demonstrates the potential for growth in the AI infrastructure market.
Fine-Tuning and Model Customization
Fine-tuning models can be a complex task due to the prerequisite of high-quality data and the technical know-how required for optimal performance. While a significant number of users currently utilize Together AI's inference services, many are also beginning to explore fine-tuning options. Customization allows businesses to adapt AI models to their specific use cases, potentially improving their effectiveness and efficiency. However, it is essential to recognize that the preliminary steps in fine-tuning require careful data preparation and understanding to yield significant results.
Importance of Speed and Reliability in Inference
Speed is a vital factor in inference performance, as users require rapid responses from AI models to maintain efficient workflows. Together AI emphasizes its ability to optimize inference speed through a custom-built stack and specialized kernel development aimed at reducing latency. Enhancing speed not only improves user satisfaction but also allows developers to focus on refining their applications. Reliability in these systems ensures that users can count on consistent performance, which is crucial for businesses relying on AI-driven operations.
Trends in AI Application Development
Current trends in AI application development showcase a shift towards creating more practical, efficient uses of technology rather than pursuing overly ambitious projects. Many developers have found success by starting simple, launching MVP (Minimum Viable Product) models quickly, and iterating based on user feedback. This approach fosters a more realistic understanding of AI capabilities while minimizing the risks and complexities often associated with built-out applications. As more developers recognize this, the focus on highly specialized, nuanced applications will likely continue to grow, enriching the AI landscape.
Today we have Hassan back on the show. Hassan was one of our first guests for Huddle when he was working at Vercel, but since then, he's joined Together AI, one of the hottest companies in the world. They just raised a massive series B round.
Hassan joins us to talk about Together AI, inference optimization and building AI applications. We touch on a bunch of topics like customer uses of AI, best practices for building apps, and what's next for Together AI.
Timestamps
01:42 Opportunity at Together AI
04:26 Together raised a big round
06:06 Vision Behind Together AI
08:32 Problems in running Open Source Models
11:40 Speed For Inference
14:24 Fine Tuning
19:23 One or Two Models or a Combination of them
21:32 Serverless
22:21 Cold Start issues?
27:46 How much data do you need?
30:00 Balancing Reliability and Cost
34:07 How customers are using Together
42:36 Agent Recipes
47:03 Typical Mistakes buiilding AI apps
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.