

Data Skeptic
Kyle Polich
The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.
Episodes
Mentioned books

May 8, 2015 • 35min
Using Data to Help Those in Crisis
This week Noelle Sio Saldana discusses her volunteer work at Crisis Text Line - a 24/7 service that connects anyone with crisis counselors. In the episode we discuss Noelle's career and how, as a participant in the Pivotal for Good program (a partnership with DataKind), she spent three months helping find insights in the messaging data collected by Crisis Text Line. These insights helped give visibility into a number of different aspects of Crisis Text Line's services. Listen to this episode to find out how! If you or someone you know is in a moment of crisis, there's someone ready to talk to you by texting the shortcode 741741.

May 1, 2015 • 35min
The Ghost in the MP3
Have you ever wondered what is lost when you compress a song into an MP3? This week's guest Ryan Maguire did more than that. He worked on software to issolate the sounds that are lost when you convert a lossless digital audio recording into a compressed MP3 file. To complete his project, Ryan worked primarily in python using the pyo library as well as the Bregman Toolkit Ryan mentioned humans having a dynamic range of hearing from 20 hz to 20,000 hz, if you'd like to hear those tones, check the previous link. If you'd like to know more about our guest Ryan Maguire you can find his website at the previous link. To follow The Ghost in the MP3 project, please checkout their Facebook page, or on the sitetheghostinthemp3.com. A PDF of Ryan's publication quality write up can be found at this link: The Ghost in the MP3 and it is definitely worth the read if you'd like to know more of the technical details.

Apr 28, 2015 • 27min
Data Fest 2015
This episode contains converage of the 2015 Data Fest hosted at UCLA. Data Fest is an analysis competition that gives teams of students 48 hours to explore a new dataset and present novel findings. This year, data from Edmunds.com was provided, and students competed in three categories: best recommendation, best use of external data, and best visualization.

Apr 24, 2015 • 16min
[MINI] Cornbread and Overdispersion
For our 50th episode we enduldge a bit by cooking Linhda's previously mentioned "healthy" cornbread. This leads to a discussion of the statistical topic of overdispersion in which the variance of some distribution is larger than what one's underlying model will account for.

8 snips
Apr 17, 2015 • 13min
[MINI] Natural Language Processing
This podcast explores the concepts and techniques of natural language processing, including stemming, n-grams, part of speech tagging, and the bag of words approach. It discusses the challenges and applications of training computers to understand and recognize words in sentences and emphasizes the importance of word context and sequences in extracting meaning. The limitations of the 'bag of words' approach are highlighted, and examples are given to demonstrate how word frequency counts can be used to detect similarities between books.

Apr 10, 2015 • 32min
Computer-based Personality Judgments
Guest Youyou Wu discuses the work she and her collaborators did to measure the accuracy of computer based personality judgments. Using Facebook "like" data, they found that machine learning approaches could be used to estimate user's self assessment of the "big five" personality traits: openness, agreeableness, extraversion, conscientiousness, and neuroticism. Interestingly, the computer-based assessments outperformed some of the assessments of certain groups of human beings. Listen to the episode to learn more. The original paper Computer-based personality judgements are more accurate than those made by humansappeared in the January 2015 volume of the Proceedings of the National Academy of Sciences (PNAS). For her benevolent Youyou recommends Private traits and attributes are predictable from digital records of human behavior by Michal Kosinski, David Stillwell, and Thore Graepel. It's a similar paper by her co-authors which looks at demographic traits rather than personality traits. And for her self-serving recommendation, Youyou has a link that I'm very excited about. You can visitApplyMagicSauce.com to see how this model evaluates your personality based on your Facebook like information. I'd love it if listeners participated in this research and shared your perspective on the results via The Data Skeptic Podcast Facebook page. I'm going to be posting mine there for everyone to see.

Apr 3, 2015 • 16min
[MINI] Markov Chain Monte Carlo
Explore how Markov Chain Monte Carlo (MCMC) algorithms can be used to model complex systems and track movement probability. Learn about the application of MCMC in winery popularity and understanding likelihood of visiting wineries. Discover the real-life applications of MCMC in determining probability distributions, advertising placement, and popular routes.

4 snips
Mar 20, 2015 • 11min
[MINI] Markov Chains
This podcast discusses Markov Chains and their applications in various systems including stop lights, text prediction, and bowling. The hosts explore the concept of Markov Chains in daily life and technology, as well as their impact on partially observable state spaces.

Mar 13, 2015 • 33min
Oceanography and Data Science
Nicole Goebel joins us this week to share her experiences in oceanography studying phytoplankton and other aspects of the ocean and how data plays a role in that science. We also discuss Thinkful where Nicole and I are both mentors for the Introduction to Data Science course. Last but not least, check out Nicole's blog Data Science Girl and the videos Kyle mentioned on her Youtube channel featuring one on the diversity of phytoplankton and how that changes in time and space.

Mar 6, 2015 • 18min
[MINI] Ordinary Least Squares Regression
The podcast explores Ordinary Least Squares regression, discussing the concept of regression and fitting models, making a YouTube video for a healthy cornbread recipe and discussing an ice cream recipe, controlling variables in an ice cream experiment, and exploring linear relationships in regression analysis.