
Data Skeptic
The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.
Latest episodes

Dec 9, 2024 • 33min
Graph Transformations
Adam Machowczyk, a PhD student at the University of Leicester, specializes in graph rewriting and machine learning. He reveals how graph rewriting can enhance model adaptability, particularly in guiding machine learning for complex tasks. Topics include the transformation of graph structures for improved recommendations in social networks and its applications in chemistry and IoT analysis. Adam illustrates the shift from traditional data representation to dynamic graph systems, showcasing real-world implications and the future of scalable adaptive models.

6 snips
Nov 25, 2024 • 37min
Networks for AB Testing
Wentao Su, a data scientist at ByteDance, specializes in A/B testing for social media platforms. He dives into the challenges of A/B testing in dynamic networks, highlighting the spillover effects that can distort results. Wentao introduces innovative strategies like one-degree label propagation to optimize test accuracy. He also explores how user interconnectedness impacts both experimental design and user experience. Finally, he discusses the significance of robust data processing techniques to effectively manage large-scale experiments.

Nov 18, 2024 • 38min
Lessons from eGamer Networks
In this engaging discussion, Alex Bisberg, a PhD candidate at USC, dives into the fascinating world of network science and game analytics. He unveils how generosity spreads like a contagious virus within gaming communities and reveals the power of weak ties in fostering new connections. Bisberg explores the innovative use of candles as a unique in-game currency promoting collaborative play. Listeners will discover insights on social mechanics in games like 'Sky Children of the Light' and how these dynamics can enhance player engagement and retention.

7 snips
Nov 11, 2024 • 42min
Github Collaboration Network
Behnaz Moradi-Jamei, an assistant professor at James Madison University specializing in network data science, delves into the intricate web of GitHub contributors. She unveils her groundbreaking analysis of a sprawling network connecting 700,000 developers through shared contributions. The conversation touches on community detection algorithms, ethical considerations in network analysis, and innovative methodologies for enhancing collaboration insights. Behnaz emphasizes the importance of adapting algorithms to reflect real-world developer interactions, pushing the boundaries of open-source community understanding.

Nov 4, 2024 • 42min
Graphs and ML for Robotics
Join Abhishek Paudel, a PhD Student at George Mason University specializing in robotics and machine learning. He shares fascinating insights into how graph neural networks can classify rooms and enhance robotic navigation. Explore the evolution of machine learning in robotics, and the impact of deep learning on perception and motion control. Abhishek discusses the integration of natural language processing and innovative graph-based methods for decision-making, highlighting their role in improving spatial awareness and learning from mistakes.

Oct 29, 2024 • 52min
Graphs for HPC and LLMs
Maciej Besta, a senior researcher at the Scalable Parallel Computing Lab, discusses the cutting-edge intersection of graph theory and high-performance computing. He explores how graph structures enhance large language models through APIs and hypergraphs. The conversation covers challenges in graph databases, the LPG2Vec encoder for data embedding, and advancements in prompt engineering to optimize problem-solving capabilities. Besta also dives into methodologies like chain of thought and the complexities of graph theory in developing efficient language models.

12 snips
Oct 21, 2024 • 36min
Graph Databases and AI
Yuanyuan Tian, Principal Scientist Manager at Microsoft Gray Systems Lab, dives into the world of graph databases and their applications. She discusses overcoming the hurdles small enterprises face in adopting this technology and the importance of the GQL project for standardization. The conversation highlights how graph databases enhance fraud detection in finance, optimize supply chains, and improve healthcare analytics. Yuanyuan also explains the role of large language models and specialized query languages in making these powerful databases more accessible.

Oct 14, 2024 • 30min
Network Analysis in Practice
Asaf Shapira, a network analysis consultant and the host of NETfrix, dives into the intricacies of network science. He discusses how network analysis techniques can identify malicious activities, including bot farms on social media. The conversation touches on the impact of social networks on elections and the importance of community detection algorithms in understanding organizational dynamics. From the historical roots of network analysis to its modern applications in areas like COVID-19 contact tracing, Asaf sheds light on its widespread yet underutilized potential.

Oct 7, 2024 • 30min
Animal Intelligence Final Exam
This discussion wraps up the Animal Intelligence season with reflections on production challenges and guest engagements. The speakers emphasize how seamless editing enhances listener experience, especially for non-native English speakers. They delve into the intersection of machine learning with biology and the significance of integrating these technologies for species identification. The ethics of animal research and the complexities of animal consciousness are pondered, alongside personal experiences that highlight ongoing education in data science.

7 snips
Sep 24, 2024 • 26min
Process Mining with LLMs
David Obembe, a recent graduate from the University of Tartu, dives into his master's thesis on blending large language models with process mining tools. He explains how process mining uses event logs to map out inefficiencies in business processes. Fascinating insights include the evolution of these techniques post-LLM integration, enhancing data retrieval and insights. David shares his experiments with Retrieval Augmented Generation and discusses the challenges of prompt engineering, highlighting the balance between accuracy and model reliability.