How to Make a Chip That Talks to Each Other

We can run sparse or dense. We harvest the sparsity by reading we get a performance boost because we're not wasting time multiplying by zero. It's published a series of blogs at NURPs showing that we could trade models that were 90% sparse to state of the accuracy, including GPT models. And they took far fewer flops to do it and could be done at much less time.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app