

Data Skeptic
Kyle Polich
The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.
Episodes
Mentioned books

Oct 29, 2024 • 52min
Graphs for HPC and LLMs
Maciej Besta, a senior researcher at the Scalable Parallel Computing Lab, discusses the cutting-edge intersection of graph theory and high-performance computing. He explores how graph structures enhance large language models through APIs and hypergraphs. The conversation covers challenges in graph databases, the LPG2Vec encoder for data embedding, and advancements in prompt engineering to optimize problem-solving capabilities. Besta also dives into methodologies like chain of thought and the complexities of graph theory in developing efficient language models.

12 snips
Oct 21, 2024 • 36min
Graph Databases and AI
Yuanyuan Tian, Principal Scientist Manager at Microsoft Gray Systems Lab, dives into the world of graph databases and their applications. She discusses overcoming the hurdles small enterprises face in adopting this technology and the importance of the GQL project for standardization. The conversation highlights how graph databases enhance fraud detection in finance, optimize supply chains, and improve healthcare analytics. Yuanyuan also explains the role of large language models and specialized query languages in making these powerful databases more accessible.

Oct 14, 2024 • 30min
Network Analysis in Practice
Asaf Shapira, a network analysis consultant and the host of NETfrix, dives into the intricacies of network science. He discusses how network analysis techniques can identify malicious activities, including bot farms on social media. The conversation touches on the impact of social networks on elections and the importance of community detection algorithms in understanding organizational dynamics. From the historical roots of network analysis to its modern applications in areas like COVID-19 contact tracing, Asaf sheds light on its widespread yet underutilized potential.

Oct 7, 2024 • 30min
Animal Intelligence Final Exam
This discussion wraps up the Animal Intelligence season with reflections on production challenges and guest engagements. The speakers emphasize how seamless editing enhances listener experience, especially for non-native English speakers. They delve into the intersection of machine learning with biology and the significance of integrating these technologies for species identification. The ethics of animal research and the complexities of animal consciousness are pondered, alongside personal experiences that highlight ongoing education in data science.

7 snips
Sep 24, 2024 • 26min
Process Mining with LLMs
David Obembe, a recent graduate from the University of Tartu, dives into his master's thesis on blending large language models with process mining tools. He explains how process mining uses event logs to map out inefficiencies in business processes. Fascinating insights include the evolution of these techniques post-LLM integration, enhancing data retrieval and insights. David shares his experiments with Retrieval Augmented Generation and discusses the challenges of prompt engineering, highlighting the balance between accuracy and model reliability.

Sep 17, 2024 • 23min
Open Animal Tracks
Risa Shinoda, a PhD student from Kyoto University focusing on computer vision, dives into the fascinating world of animal tracking. She unveils the OpenAnimalTracks dataset, designed for predicting animal footprints and discusses her model’s algorithms and accuracy. Risa explores how computer vision is revolutionizing agriculture, enhancing practices and animal welfare. She also addresses challenges in capturing precise photographic evidence and the critical role of understanding animal tracks in wildlife conservation.

Sep 10, 2024 • 40min
Bird Distribution Modeling with Satbird
Mélisande Teng, a PhD candidate at Université de Montréal, dives into her groundbreaking research on biodiversity monitoring using remote sensing and computer vision. She discusses the innovative Satbird project, which enhances bird distribution modeling by combining satellite data and citizen science. The conversation highlights challenges like data imbalance in different regions and the importance of acoustic monitoring. Mélisande also explores the intricacies of joint species distribution modeling and advocates for collaboration between machine learning and ecology to advance conservation efforts.

Aug 26, 2024 • 31min
Ant Encounters
In this discussion, Deborah Gordon, an author exploring ant colony dynamics, shares fascinating insights on how these tiny creatures exhibit complex behaviors. She highlights the concept of collective intelligence, revealing how simple interactions lead to adaptive solutions. Deborah also discusses the vast diversity of ant species and their unexpected habitats. Listeners will learn how ants' navigation strategies and social interactions can inform models in artificial intelligence, showcasing the remarkable intelligence of decentralized systems.

7 snips
Aug 19, 2024 • 39min
Computing Toolbox
Madlen Wilmes, co-author of 'Computing Skills for Biologists,' shares her insights on essential computing skills for biologists. She discusses her transition from academia to finance, highlighting how transferable data analysis skills can open unexpected career paths. Madlen emphasizes the importance of programming languages like R and Python in data science. She also covers the challenges and advantages of moving into industry, including the need for strong networking and collaboration skills, while addressing the impact of soft skills on professional success.

Aug 14, 2024 • 32min
Biodiversity Monitoring
Hager Radi, a specialist in biodiversity monitoring, delves into the intricate world of species distribution modeling. She discusses the challenges posed by incomplete data and biases in presence-only datasets. Hager highlights the innovative use of machine learning and remote sensing, showcasing how these technologies can help predict species distributions even with limited observations. She also sheds light on exciting developments like using drones and citizen science platforms, emphasizing the importance of tech in conservation efforts.