Data Skeptic

Kyle Polich

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Episodes

Mentioned books

Dec 23, 2023 • 24min

I LLM and You Can Too

The podcast explores the utilization of large language models in daily life and work processes. It discusses the challenges and risks of using them as a service, the concept of retrieval augmented generation, and the use of embeddings and LLMs in text analysis and product development. The podcast also delves into the applications of text embeddings in similarity, search, and classification tasks, while addressing their limitations and potential risks.

Dec 19, 2023 • 40min

Q&A with Kyle

In this Q&A episode, the host discusses finding guests algorithmically, exploring impactful technologies and tools, data annotation as remote work, Cue Basic programming language, programming experiences and hacker culture, 'grab' command line utility and the importance of Git for source control.

Dec 12, 2023 • 29min

LLMs for Data Analysis

Amir Netz, Technical Fellow at Microsoft and CTO of Microsoft Fabric, discusses how business intelligence has evolved, Power BI and Fabric, building and deploying ML models, benefits of Fabric's auto-integration and auto-optimization, Copilot capabilities, and future developments.

Dec 4, 2023 • 34min

AI Platforms

Eric Boyd, Corporate Vice President of AI at Microsoft, shares how organizations can leverage AI for faster development. He discusses the benefits of using natural language to build products and the future of version control. Eric mentions some foundational models in Azure AI and their capabilities.

Nov 27, 2023 • 35min

Deploying LLMs

Joining us on this episode are Aaron Reich, CTO at Avanade, and Priyanka Shah, MVP for Microsoft AI. They discuss implementing generative AI for productivity gain, AI model evolution, hardware changes, designing new products and services, current state of AI strategy, and building a custom co-pilot.

Nov 20, 2023 • 26min

A Survey Assessing Github Copilot

Jenny Liang, a PhD student at Carnegie Mellon University, discusses her recent survey on the usability of AI programming assistants. She shares some questions and takeaways from the survey, as well as the major reasons developers don't want to use code-generation tools. Concerns about intellectual property and the access code-generation tools have to in-house code are discussed.

Nov 13, 2023 • 32min

Program Aided Language Models

PhD students Aman Madaan and Shuyan Zhou discuss their paper on Program-Aided Language Models. They talk about the evolution and performance of LLMs on arithmetic tasks. Aman introduces PAL and its improvement on arithmetic tasks. Shuyan explains how PAL's performance was evaluated and the limitations of LLMs. They discuss the potential impact of PAL on math education and future research steps.

Nov 6, 2023 • 40min

Which Programming Language is ChatGPT Best At

Alessio Buscemi, software engineer at Lifeware SA, discusses the impact of ChatGPT on software engineers and the efficiency of code generation. He presents a comparative study on code generation across 10 programming languages using ChatGPT 3.5, highlighting unexpected results. The performance of different programming languages is analyzed, with discussions on language popularity and implications on industry practices. Alessio also shares insights on current projects, including sentiment analysis and investigating plagiarism.

Oct 31, 2023 • 31min

GraphText

Jianan Zhao, a computer science student, joins to discuss using graphs with LLMs efficiently. They explore graph inductive bias, graph machine learning, limitations of natural language models for graphs, graph text as a preprocessing step, information loss in translation process, and comparison with graph neural networks.

Oct 23, 2023 • 28min

arXiv Publication Patterns

Rajiv Movva, a PhD student in Computer Science at Cornell Tech University, discusses the findings of his research on arXiv publication patterns for LLMs. He shares insights on the increase in LLMs research and proportions of papers published by universities, organizations, and industry leaders. He highlights the focus on the social impact of LLMs and explores exciting applications in education.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner