Data Skeptic cover image

Data Skeptic

Latest episodes

undefined
Dec 11, 2015 • 10min

[MINI] Term Frequency - Inverse Document Frequency

Today's topic is term frequency inverse document frequency, which is a statistic for estimating the importance of words and phrases in a set of documents.
undefined
Dec 4, 2015 • 42min

The Hunt for Vulcan

Early astronomers could see several of the planets with the naked eye. The invention of the telescope allowed for further understanding of our solar system. The work of Isaac Newton allowed later scientists to accurately predict Neptune, which was later observationally confirmed exactly where predicted. It seemed only natural that a similar unknown body might explain anomalies in the orbit of Mercury, and thus began the search for the hypothesized planet Vulcan. Thomas Levenson's book "The Hunt for Vulcan" is a narrative of the key scientific minds involved in the search and eventual refutation of an unobserved planet between Mercury and the sun. Thomas joins me in this episode to discuss his book and the fascinating story of the quest to find this planet. During the discussion, we mention one of the contributions made by Urbain-Jean-Joseph Le Verrier which involved some complex calculations which enabled him to predict where to find the planet that would eventually be called Neptune. The calculus behind this work is difficult, and some of that work is demonstrated in a Jupyter notebook I recently discovered from Paulo Marques titled The-Body Problem. Thomas Levenson is a professor at MIT and head of its science writing program. He is the author of several books, including Einstein in Berlin and Newton and the Counterfeiter: The Unknown Detective Career of the World’s Greatest Scientist. He has also made ten feature-length documentaries (including a two-hour Nova program on Einstein) for which he has won numerous awards. In his most recent book "The Hunt for Vulcan", explores the century spanning quest to explain the movement of the cosmos via theory and the role the hypothesized planet Vulcan played in the story. Follow Thomas on twitter @tomlevenson and check out his blog athttps://inversesquare.wordpress.com/. Pick up your copy of The Hunt for Vulcan at your local bookstore, preferred book buying place, or at the Penguin Random House site.
undefined
Nov 27, 2015 • 17min

[MINI] The Accuracy Paradox

Today's episode discusses the accuracy paradox. There are cases when one might prefer a less accurate model because it yields more predictive power or better captures the underlying causal factors describing the outcome variable you are interested in. This is especially relevant in machine learning when trying to predict rare events. We discuss how the accuracy paradox might apply if you were trying to predict the likelihood a person was a bird owner.
undefined
Nov 20, 2015 • 40min

Neuroscience from a Data Scientist's Perspective

... or should this have been called data science from a neuroscientist's perspective? Either way, I'm sure you'll enjoy this discussion with Laurie Skelly. Laurie earned a PhD in Integrative Neuroscience from the Department of Psychology at the University of Chicago. In her life as a social neuroscientist, using fMRI to study the neural processes behind empathy and psychopathy, she learned the ropes of zooming in and out between the macroscopic and the microscopic -- how millions of data points come together to tell us something meaningful about human nature. She's currently at Metis Data Science, an organization that helps people learn the skills of data science to transition in industry. In this episode, we discuss fMRI technology, Laurie's research studying empathy and psychopathy, as well as the skills and tools used in common between neuroscientists and data scientists. For listeners interested in more on this subject, Laurie recommended the blogs Neuroskeptic, Neurocritic, and Neuroecology. We conclude the episode with a mention of the upcoming Metis Data Science San Francisco cohort which Laurie will be teaching. If anyone is interested in applying to participate, they can do so here.
undefined
Nov 13, 2015 • 14min

[MINI] Bias Variance Tradeoff

A discussion of the expected number of cars at a stoplight frames today's discussion of the bias variance tradeoff. The central ideal of this concept relates to model complexity. A very simple model will likely generalize well from training to testing data, but will have a very high variance since it's simplicity can prevent it from capturing the relationship between the covariates and the output. As a model grows more and more complex, it may capture more of the underlying data but the risk that it overfits the training data and therefore does not generalize (is biased) increases. The tradeoff between minimizing variance and minimizing bias is an ongoing challenge for data scientists, and an important discussion for skeptics around how much we should trust models.
undefined
Nov 6, 2015 • 32min

Big Data Doesn't Exist

The recent opinion piece Big Data Doesn't Exist on Tech Crunch by Slater Victoroff is an interesting discussion about the usefulness of data both big and small. Slater joins me this episode to discuss and expand on this discussion. Slater Victoroff is CEO of indico Data Solutions, a company whose services turn raw text and image data into human insight. He, and his co-founders, studied at Olin College of Engineering where indico was born. indico was then accepted into the "Techstars Accelarator Program" in the Fall of 2014 and went on to raise $3M in seed funding. His recent essay "Big Data Doesn't Exist" received a lot of traction on TechCrunch, and I have invited Slater to join me today to discuss his perspective and touch on a few topics in the machine learning space as well.
undefined
Oct 30, 2015 • 14min

[MINI] Covariance and Correlation

The degree to which two variables change together can be calculated in the form of their covariance. This value can be normalized to the correlation coefficient, which has the advantage of transforming it to a unitless measure strictly bounded between -1 and 1. This episode discusses how we arrive at these values and why they are important.
undefined
Oct 23, 2015 • 30min

Bayesian A/B Testing

Today's guest is Cameron Davidson-Pilon. Cameron has a masters degree in quantitative finance from the University of Waterloo. Think of it as statistics on stock markets. For the last two years he's been the team lead of data science at Shopify. He's the founder of dataoragami.net which produces screencasts teaching methods and techniques of applied data science. He's also the author of the just released in print book Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference, which you can also get in a digital form. This episode focuses on the topic of Bayesian A/B Testing which spans just one chapter of the book. Related to today's discussion is the Data Origami post The class imbalance problem in A/B testing. Lastly, Data Skeptic will be giving away a copy of the print version of the book to one lucky listener who has a US based delivery address. To participate, you'll need to write a review of any site, book, course, or podcast of your choice on datasciguide.com. After it goes live, tweet a link to it with the hashtag #WinDSBook to be given an entry in the contest. This contest will end November 20th, 2015, at which time I'll draw a single randomized winner and contact them for delivery details via direct message on Twitter.
undefined
Oct 16, 2015 • 13min

[MINI] The Central Limit Theorem

The central limit theorem is an important statistical result which states that typically, the mean of a large enough set of independent trials is approximately normally distributed.  This episode explores how this might be used to determine if an amazon parrot like Yoshi produces or or less waste than an African Grey, under the assumption that the individual distributions are not normal.
undefined
Oct 9, 2015 • 39min

Accessible Technology

Today's guest is Chris Hofstader (@gonz_blinko), an accessibility researcher and advocate, as well as an activist for causes such as improving access to information for blind and vision impaired people. His background in computer programming enabled him to be the leader of JAWS, a Windows program that allowed people with a visual impairment to read their screen either through text-to-speech or a refreshable braille display. He's the Managing Member of 3 Mouse Technology. He's also a frequent blogger primarily at chrishofstader.com. For web developers and site owners, Chris recommends two tools to help test for accessibility issues: tenon.io and dqtech.co. A guest post from Chris appeared on the Skepchick blogged titled Skepticism and Disability which lead to the formation of the sister site Skeptibility. In a discussion of skepticism and favorite podcasts, Chris mentioned a number of great shows, most notably The Pod Delusion to which he was a contributor. Additionally, Chris has also appeared on The Atheist Nomads. Lastly, a shout out from Chris to musician Shelley Segal whom he hosted just before the date of recording of this episode. Her music can be found on her site or via bandcamp.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode