#017 Unlocking Value from Unstructured Data, Real-World Applications of Generative AI

Jul 16, 2024

Founder of Reach Latent, Jonathan Yarkoni, discusses using generative AI to extract value from unstructured data in industries like legal and weather prediction. He delves into the challenges of AI projects, the impact of ChatGPT, and future AI trends. Topics include the less data cleaning required for generative AI, optimized tech stacks, and the potential of synthetic data generation for training AI systems.

Ask episode

Chapters

Transcript

Episode notes

Intro

00:00 • 3min

Tailoring AI Project Frameworks for Success

03:16 • 16min

Exploring the Reliability of Agents in AI Development

19:28 • 2min

Unlocking Value with Generative AI

21:40 • 6min

Dream Tech Stack and Future Wishes for Generative AI

28:08 • 2min

Developing a Security-oriented System with Synthetic Data

29:57 • 7min

In this episode of "How AI is Built," host Nicolay Gerold interviews Jonathan Yarkoni, founder of Reach Latent. Jonathan shares his expertise in extracting value from unstructured data using AI, discussing challenging projects, the impact of ChatGPT, and the future of generative AI. From weather prediction to legal tech, Jonathan provides valuable insights into the practical applications of AI across various industries.

Key Takeaways

Generative AI projects often require less data cleaning due to the models' tolerance for "dirty" data, allowing for faster implementation in some cases.
The success of AI projects post-delivery is ensured through monitoring, but automatic retraining of generative AI applications is not yet common due to evaluation challenges.
Industries ripe for AI disruption include text-heavy fields like legal, education, software engineering, and marketing, as well as biotech and entertainment.
The adoption of AI is expected to occur in waves, with 2024 likely focusing on internal use cases and 2025 potentially seeing more customer-facing applications as models improve.
Synthetic data generation, using models like GPT-4, can be a valuable approach for training AI systems when real data is scarce or sensitive.
Evaluation frameworks like RAGAS and custom metrics are essential for assessing the quality of synthetic data and AI model outputs.
Jonathan’s ideal tech stack for generative AI projects includes tools like Instructor, Guardrails, Semantic Routing, DSPY, LangChain, and LlamaIndex, with a growing emphasis on evaluation stacks.

Key Quotes

"I think we're going to see another wave in 2024 and another one in 2025. And people are familiarized. That's kind of the wave of 2023. 2024 is probably still going to be a lot of internal use cases because it's a low risk environment and there was a lot of opportunity to be had."

"To really get to production reliably, we have to have these tools evolve further and get more standardized so people can still use the old ways of doing production with the new technology."

Jonathan Yarkoni

Nicolay Gerold:

Chapters

00:00 Introduction: Extracting Value from Unstructured Data
03:16 Flexible Tailoring Solutions to Client Needs
05:39 Monitoring and Retraining Models in the Evolving AI Landscape
09:15 Generative AI: Disrupting Industries and Unlocking New Possibilities
17:47 Balancing Immediate Results and Cutting-Edge Solutions in AI Development
28:29 Dream Tech Stack for Generative AI

unstructured data, textual data, automation, weather prediction, data cleaning, chat GPT, AI disruption, legal, education, software engineering, marketing, biotech, immediate results, cutting-edge solutions, tech stack