#17 Jonathan Yarkoni on Unlocking Value from Unstructured Data, Real-World Applications of Generative AI
Jul 16, 2024
auto_awesome
Founder of Reach Latent, Jonathan Yarkoni, discusses using generative AI to extract value from unstructured data in industries like legal and weather prediction. He delves into the challenges of AI projects, the impact of ChatGPT, and future AI trends. Topics include the less data cleaning required for generative AI, optimized tech stacks, and the potential of synthetic data generation for training AI systems.
Generative AI projects tolerate dirty data for faster implementation.
Automatic retraining of generative AI apps post-delivery needs evaluation improvements.
AI disruption targets text-heavy fields, biotech, entertainment with model advancements.
2024 focuses on internal AI use cases, 2025 may see customer-facing applications.
Deep dives
Value of AI in Textual Data Analysis for Productivity Increase
Using Gen AI, companies with large amounts of textual data like contracts and documents can improve productivity through automation processes. By tackling hard problems and offering solutions that exceed internal attempts, AI models can aid in areas such as weather prediction for rare events.
Adaptive Framework and Data Processing for AI Projects
While maintaining an overall project framework, AI projects are adapted based on client preferences and problem nature. Not all projects require extensive data collection and cleaning, as some generative AI projects can tolerate imperfect data.
Evolution of AI Models and Industry Disruption Opportunities
As new AI models like Chat GPT impact various industries, the potential for disruption expands. Text-heavy sectors such as legal, education, and marketing are ripe for AI utilization. Biotech also presents opportunities with advancements in generating new proteins, while future fusion models may impact entertainment and gaming industries.
Improving Validation and Frameworks in Generative AI Space
Enhancements in open-source and proprietary evaluation tools are desired to drive model understanding and performance refinement. Emphasis on smaller, task-specific models and effective planning frameworks can propel the generative AI landscape forward.
Practical Tips in Prompt Engineering and Project Management with AI
Leveraging tricks like prompt engineering and emotional stimuli in prompt creation can enhance AI model outputs. Integrating chat GPT for tasks like project management plans can streamline workflows, showcasing AI's practical benefits in diverse organizational contexts.
Creating Synthetic Data for Problem-Solving with AI
Synthetic data generation using strong models like GPT-4 enables quick project initiation, beneficial for scenarios requiring data privacy. Filtering synthetic data examples involves metric assessments like key data retention and summary length to ensure data quality and relevance.
Engagement and Collaboration with AI Development Services
Prospective clients seeking AI solutions can connect with experts like Jonathan Yarkoni through LinkedIn or Reach Latent's website. Collaboration offers tailored development and consulting services for those aiming to leverage AI technologies effectively.
In this episode of "How AI is Built," host Nicolay Gerold interviews Jonathan Yarkoni, founder of Reach Latent. Jonathan shares his expertise in extracting value from unstructured data using AI, discussing challenging projects, the impact of ChatGPT, and the future of generative AI. From weather prediction to legal tech, Jonathan provides valuable insights into the practical applications of AI across various industries.
Key Takeaways
Generative AI projects often require less data cleaning due to the models' tolerance for "dirty" data, allowing for faster implementation in some cases.
The success of AI projects post-delivery is ensured through monitoring, but automatic retraining of generative AI applications is not yet common due to evaluation challenges.
Industries ripe for AI disruption include text-heavy fields like legal, education, software engineering, and marketing, as well as biotech and entertainment.
The adoption of AI is expected to occur in waves, with 2024 likely focusing on internal use cases and 2025 potentially seeing more customer-facing applications as models improve.
Synthetic data generation, using models like GPT-4, can be a valuable approach for training AI systems when real data is scarce or sensitive.
Evaluation frameworks like RAGAS and custom metrics are essential for assessing the quality of synthetic data and AI model outputs.
Jonathan’s ideal tech stack for generative AI projects includes tools like Instructor, Guardrails, Semantic Routing, DSPY, LangChain, and LlamaIndex, with a growing emphasis on evaluation stacks.
Key Quotes
"I think we're going to see another wave in 2024 and another one in 2025. And people are familiarized. That's kind of the wave of 2023. 2024 is probably still going to be a lot of internal use cases because it's a low risk environment and there was a lot of opportunity to be had."
"To really get to production reliably, we have to have these tools evolve further and get more standardized so people can still use the old ways of doing production with the new technology."
00:00 Introduction: Extracting Value from Unstructured Data 03:16 Flexible Tailoring Solutions to Client Needs 05:39 Monitoring and Retraining Models in the Evolving AI Landscape 09:15 Generative AI: Disrupting Industries and Unlocking New Possibilities 17:47 Balancing Immediate Results and Cutting-Edge Solutions in AI Development 28:29 Dream Tech Stack for Generative AI