

The Data Exchange with Ben Lorica
Ben Lorica
A series of informal conversations with thought leaders, researchers, practitioners, and writers on a wide range of topics in technology, science, and of course big data, data science, artificial intelligence, and related applications. Anchored by Ben Lorica (@BigData), the Data Exchange also features a roundup of the most important stories from the worlds of data, machine learning and AI. Detailed show notes for each episode can be found on https://thedataexchange.media/ The Data Exchange podcast is a production of Gradient Flow [https://gradientflow.com/].
Episodes
Mentioned books

Nov 30, 2023 • 49min
The Evolution of Crypto, Blockchain, and Web3
CEO of BlockApps and Co-Chair of the Enterprise Ethereum Alliance discusses web3 technologies, transitioning from mining to proof of stake in Ethereum, impact of Proof of Stake and NFTs in the crypto space, intersection of blockchain, AI, and crypto, creation and governance of base models for ML, current happenings in crypto and web3, advantages of blockchain and collapse of VC market

9 snips
Nov 23, 2023 • 43min
Open Source Data and AI: Past, Present, Future
The podcast discusses the evolution of big data and AI technologies, the rise of open source data in the tech industry, the future of AI and machine learning in a decentralized world, simplifying workload and data movement across cloud and on-prem environments, challenges in data management, and the power of networking in open source data.

Nov 16, 2023 • 50min
Orchestration for LLM and RAG applications
Malte Pietsch, co-founder & CTO of Deepset, discusses the importance of orchestration frameworks for LLM applications, the usage patterns of the Haystack framework, and optimizing RAG applications with metadata and knowledge graphs. They also explore the evolution of data engineering pipelines, real-time indexing, and the highlights and features of Haystack 2.0.

6 snips
Nov 9, 2023 • 49min
Reflections from the First AI Conference in San Francisco
The hosts analyze takeaways from the inaugural AI conference in San Francisco, discussing the importance of empirical evidence. Experimenting and iterating in AI leads to improved results. The rise of open source and custom foundation models in AI is explored. The use of ensembles in machine learning and highlights from the AI conference are discussed, including generative AI for speech.

Nov 2, 2023 • 51min
Kùzu: A simple, extremely fast, and embeddable graph database
Guest Semih Salihoglu, co-creator of Kuzu, discusses the concept of a property graph, differences between property graphs and RDF in graph databases, the need for switching databases, the design and storage techniques of Kuzu, integration with other programming languages, advantages of DuckDB, and compatibility and streaming in real time.

Oct 26, 2023 • 43min
Navigating the Nuances of Retrieval Augmented Generation
Philipp Moritz and Goku Mohandas of Anyscale discuss retrieval augmented generation (RAG) systems, challenges in evaluation, labeling and classification strategies, optimizing model inference, online software stack, and hyperparameter search in evaluation runs.

Oct 19, 2023 • 40min
The Rise of Generative AI-Powered Social Media Manipulation
Researchers Bill Marcellino and Nathan Beauchamp-Mustafaga discuss the rise of generative AI and its impact on social media manipulation. They explore the use of generative AI for political and security purposes, motivations of nation-state actors, technology asymmetry, scale and control of information propagation, and combatting manipulative content on social media.

Oct 12, 2023 • 39min
Versioning and MLOps for Generative AI
Yucheng Low, CEO of XetHub, talks about managing large-scale ML assets and the challenges of data management. They discuss the need for version control, data reproducibility, and efficient solutions. The podcast covers topics such as GDPR impact on data teams, benefits of openness in data management, and distinguishing features of their tool. They also discuss the importance of deduplication, summaries, and visualization tools, and the unique features of Zetahub's user interface for data versioning and collaboration.

Oct 5, 2023 • 41min
Navigating the Generative AI Landscape
The podcast discusses various topics including the revolution of including domain expertise in industrial AI, addressing hallucination in generative AI, retrieval augmented generation for enhancing language models, the challenges of industrial AI and knowledge transfer, the impact of generative AI on jobs, and a call to action to explore the influence and concerns surrounding generative AI.

Sep 28, 2023 • 48min
Trends in Data Management: From Source to BI and Generative AI
The podcast discusses the use of graph databases in data management systems, the future of vector search in database systems, different analytic databases like SQLite, DocDB, and Duck DB, the concept of 'lake houses' in data management, exploring multimodal data in databases, challenges in building H-Tap systems, and use cases and challenges with graph databases and SQLite.