
Brian Raymond
CEO / Founder of Unstructured providing technology for Weaviate
Top 5 podcasts with Brian Raymond
Ranked by the Snipd community

12 snips
May 23, 2023 • 43min
Unstructured with Brian Raymond - Weaviate Podcast #48!
Hey everyone, thank you so much for watching the 48th episode of the Weaviate Podcast!! This is a SUPER exciting one, welcoming Brian Raymond the CEO / Founder of Unstructured! Unstructured is a perfect complimenting technology for Weaviate, helping people get their Unstructured data into Weaviate! The podcast dives into the nuances of this task, but it generally revolves around Unstructured's abstraction of Partitioning, Cleaning, and Staging! Unstructured is making groundbreaking innovations on using Visual Document Layout models for Partitioning, for example saying that this part of the PDF is the header, body, image caption, and so on. Cleaning then describes removing pesky details like whitespaces or odd characters. Staging then describes the transformations of say formatting a text chunk with it's metadata into the JSON for a Weaviate object upload! I really hope you find this podcast interesting! We are publishing a blog post as well showing an example of how to use Unstructured to get PDF data into Weaviate, please please check that out and let us know if it works for your data and how we can improve it! This blog post can be found on weaviate.io and we will be managing discussions around it both in the Weaviate slack, as well as Unstructured! Thank you so much for listening!
Check out Unstructured here! https://www.unstructured.io/
Chapters
0:00 Welcome Brian!!
0:27 What is Unstructured?
5:42 Why now? New Advancements in Unstructured
8:02 Thoughts on Data Connectors Hub
10:55 PDFs to Weaviate with Unstructured
13:53 State-of-the-Art in OCR and Document Parsing
16:10 How to get the data from Weaviate.io?
18:06 Foundation Models from Unstructured
20:45 Evaporate-Code+
23:15 CSV, Parquet, JSON transformations in Staging
25:08 Cleaning Bricks
28:02 Visual Document Examples
30:45 Text Chunking with Metadata
33:25 Knowledge Graphs with Goldman Sachs example
39:10 LLM Hallucinations
42:10 Announcements from Brian!

May 20, 2024 • 37min
E134: Making Complex Data RAG-Ready with Unstructured
Brian Raymond, Founder & CEO of Unstructured, discusses the importance of data preparation in NLP, creating a single API endpoint for handling diverse data formats, transitioning from open source to commercial success, engaging with government design partners, and the value of world-class design & marketing for open source companies.

Feb 24, 2024 • 31min
Unlocking $25M: Unstructured's CEO Brian Raymond on Data Prep for LLMs
CEO Brian Raymond discusses data preparation for Large Language Models, challenges faced in preprocessing data for AI applications, developing a single API for data processing, handling different document types, transitioning from open source to commercial API, monetization strategy, and the influence of working with the government and importance of analytics.

Jan 28, 2024 • 48min
Episode 13: Open-source panel with Anton Troynikov, Brian Raymond, and Harrison Chase
Open-source leaders in AI, Anton Troynikov, Brian Raymond, and Harrison Chase, discuss topics such as chatbot development with Lang Smith, usage of Chroma in AI applications, building open-source and commercial products, limitations of vector search in AI retrieval, favorite AI companies and TV portrayals, and the importance of agility in developing language models.

Aug 3, 2023 • 36min
ETL for LLMs
Founder of Unstructured, Brian Raymond, discusses challenges in data preprocessing for NLP solutions, efficient file processing architecture for data extraction, innovative data engineering solutions, comparison of connector capabilities in AirBite and 5trend, and evolution of ETL pipelines for Large Language Models.