Software Huddle cover image

Deep Dive into Inference Optimization for LLMs with Philip Kiely

Software Huddle

CHAPTER

Optimizing Inference in Large Language Models

This chapter explores the phases of inference in large language models, focusing on efficiency through various optimization techniques and engines like VLLM and TensorRT-LLM. It also discusses the challenges and advancements related to hardware dependencies and usability improvements in TensorRT.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner