How AI Is Built cover image

#002 AI Powered Data Transformation, Combining gen & trad AI, Semantic Validation

How AI Is Built

00:00

Optimal data feed for LLMs and examples of document tweaking

LLMs require optimal data feeds that go beyond raw text, emphasizing the importance of context and semantic co-location. Papers have highlighted the need for structuring documents to ensure LLMs understand complex information and context. Companies like Doc LLM have successfully structured unstructured data by not solely relying on text extraction. Examples include parsing HTML to Markdown before inputting into LLMs and identifying semantics in HTML markup, especially in emails which often contain tables. An orchestration of deterministic algorithms and LLMs is crucial to chunking out and structuring data effectively.

Play episode from 09:16
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app