

The Data Exchange with Ben Lorica
Ben Lorica
A series of informal conversations with thought leaders, researchers, practitioners, and writers on a wide range of topics in technology, science, and of course big data, data science, artificial intelligence, and related applications. Anchored by Ben Lorica (@BigData), the Data Exchange also features a roundup of the most important stories from the worlds of data, machine learning and AI. Detailed show notes for each episode can be found on https://thedataexchange.media/ The Data Exchange podcast is a production of Gradient Flow [https://gradientflow.com/].
Episodes
Mentioned books

Dec 21, 2023 • 42min
Knowledge Graphs: Contextualizing Enterprise Data for More Accurate LLMs
Knowledge graph experts from data.world discuss their work on using knowledge graphs to improve the accuracy of language models for question answering on structured SQL databases. They explain the creation of a knowledge graph from a data warehouse, evaluate the effectiveness of knowledge graphs in improving question answering accuracy, and discuss how to convince organizations to adopt knowledge graphs for improved data exploration. They also highlight the benefits of knowledge graphs, compare RDF and property graphs, and emphasize the importance of improving knowledge graph accuracy and combining knowledge graphs with vector databases.

12 snips
Dec 14, 2023 • 44min
TimeGPT: Machine Learning for Time Series, Made Accessible
Max Mergenthaler and Azul Garza Ramirez from Nixtla talk about TimeGPT, a simplified model for time series analysis. They discuss its simplicity, performance, and potential integration with other tools. They also explore the role of expert judgment and the future impact of TimeGPT on forecasting jobs.

Dec 7, 2023 • 54min
Best Practices for Building LLM-Backed Applications
Waleed Kadous, Chief Scientist at Anyscale, discusses best practices for building applications leveraging large language models. Topics include heuristics for working with open source models, differences between Code Lama and GitHub Co-pilot, challenges in deploying open source models, using spending data to save data, fine-tuning models in supervised machine learning, and exploring the potential of multimodal models.

Nov 30, 2023 • 49min
The Evolution of Crypto, Blockchain, and Web3
CEO of BlockApps and Co-Chair of the Enterprise Ethereum Alliance discusses web3 technologies, transitioning from mining to proof of stake in Ethereum, impact of Proof of Stake and NFTs in the crypto space, intersection of blockchain, AI, and crypto, creation and governance of base models for ML, current happenings in crypto and web3, advantages of blockchain and collapse of VC market

9 snips
Nov 23, 2023 • 43min
Open Source Data and AI: Past, Present, Future
The podcast discusses the evolution of big data and AI technologies, the rise of open source data in the tech industry, the future of AI and machine learning in a decentralized world, simplifying workload and data movement across cloud and on-prem environments, challenges in data management, and the power of networking in open source data.

Nov 16, 2023 • 50min
Orchestration for LLM and RAG applications
Malte Pietsch, co-founder & CTO of Deepset, discusses the importance of orchestration frameworks for LLM applications, the usage patterns of the Haystack framework, and optimizing RAG applications with metadata and knowledge graphs. They also explore the evolution of data engineering pipelines, real-time indexing, and the highlights and features of Haystack 2.0.

6 snips
Nov 9, 2023 • 49min
Reflections from the First AI Conference in San Francisco
The hosts analyze takeaways from the inaugural AI conference in San Francisco, discussing the importance of empirical evidence. Experimenting and iterating in AI leads to improved results. The rise of open source and custom foundation models in AI is explored. The use of ensembles in machine learning and highlights from the AI conference are discussed, including generative AI for speech.

Nov 2, 2023 • 51min
Kùzu: A simple, extremely fast, and embeddable graph database
Guest Semih Salihoglu, co-creator of Kuzu, discusses the concept of a property graph, differences between property graphs and RDF in graph databases, the need for switching databases, the design and storage techniques of Kuzu, integration with other programming languages, advantages of DuckDB, and compatibility and streaming in real time.

Oct 26, 2023 • 43min
Navigating the Nuances of Retrieval Augmented Generation
Philipp Moritz and Goku Mohandas of Anyscale discuss retrieval augmented generation (RAG) systems, challenges in evaluation, labeling and classification strategies, optimizing model inference, online software stack, and hyperparameter search in evaluation runs.

Oct 19, 2023 • 40min
The Rise of Generative AI-Powered Social Media Manipulation
Researchers Bill Marcellino and Nathan Beauchamp-Mustafaga discuss the rise of generative AI and its impact on social media manipulation. They explore the use of generative AI for political and security purposes, motivations of nation-state actors, technology asymmetry, scale and control of information propagation, and combatting manipulative content on social media.


