Software Huddle cover image

Fast Inference with Hassan El Mghari

Software Huddle

00:00

Optimizing Inference Speed in AI

This chapter explores the critical role of speed in inference engines used with AI and LLMs. It discusses innovative strategies for enhancing performance, including speculative decoding and the Together Kernels Collection, while also addressing the significance of fine-tuning and prompt engineering. The chapter concludes with insights into customer preferences for selecting machine learning models, emphasizing the use of a combination of open-source models for better application outcomes.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app