In model training, a strategy called the hardware aware algorithm is used where the state, which is necessary for creating the output, is created and materialized but then immediately discarded to reduce memory consumption. This approach focuses on storing only the input and output, utilizing the state temporarily. However, in inference, the state cannot be discarded since access to the entire sequence is lost, making it essential for maintaining the information required for mapping inputs to outputs. This contrasts with models like transformers that do not use a state or cache, simplifying the process to a direct mapping from inputs to outputs without considering a state.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode