
Deep Dive into Inference Optimization for LLMs with Philip Kiely
Software Huddle
00:00
Optimizing Inference in Large Language Models
This chapter explores the phases of inference in large language models, focusing on efficiency through various optimization techniques and engines like VLLM and TensorRT-LLM. It also discusses the challenges and advancements related to hardware dependencies and usability improvements in TensorRT.
Transcript
Play full episode