AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Measure Efficiency in the API
On efficiency, let's talk about inference efficiency. How long does it take for me to get a result back? And this is a useful thing to know because if you want to use the API, you kind of want to know. It depends on the hardware they have back. So there's a model efficiency where it's only probably a model. There's optimizations like caching and batching and various tricks. And then there's hardware. Like if you add more GPUs, sure, in many cases you can get things to go faster.