
Deep Dive into Inference Optimization for LLMs with Philip Kiely
Software Huddle
Optimizing Inference in Large Language Models
This chapter explores the phases of inference in large language models, focusing on efficiency through various optimization techniques and engines like VLLM and TensorRT-LLM. It also discusses the challenges and advancements related to hardware dependencies and usability improvements in TensorRT.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.