Data Skeptic cover image

Data Skeptic

Latest episodes

undefined
Jul 8, 2016 • 30min

Multiple Comparisons and Conversion Optimization

I'm joined by Chris Stucchio this week to discuss how deliberate or uninformed statistical practitioners can derive spurious and arbitrary results via multiple comparisons. We discuss p-hacking and a variety of other important lessons and tips for proper analysis. You can enjoy Chris's writing on his blog at chrisstucchio.com and you may also like his recent talk Multiple Comparisons: Make Your Boss Happy with False Positives, Guarenteed.
undefined
Jul 1, 2016 • 12min

[MINI] Leakage

If you'd like to make a good prediction, your best bet is to invent a time machine, visit the future, observe the value, and return to the past. For those without access to time travel technology, we need to avoid including information about the future in our training data when building machine learning models. Similarly, if any other feature whose value would not actually be available in practice at the time you'd want to use the model to make a prediction, is a feature that can introduce leakage to your model.
undefined
Jun 24, 2016 • 36min

Predictive Policing

Kristian Lum (@KLdivergence) joins me this week to discuss her work at @hrdag on predictive policing. We also discuss Multiple Systems Estimation, a technique for inferring statistical information about a population from separate sources of observation. If you enjoy this discussion, check out the panel Tyranny of the Algorithm? Predictive Analytics & Human Rights which was mentioned in the episode.
undefined
Jun 17, 2016 • 11min

[MINI] The CAP Theorem

Distributed computing cannot guarantee consistency, accuracy, and partition tolerance. Most system architects need to think carefully about how they should appropriately balance the needs of their application across these competing objectives. Linh Da and Kyle discuss the CAP Theorem using the analogy of a phone tree for alerting people about a school snow day.
undefined
Jun 10, 2016 • 33min

Detecting Terrorists with Facial Recognition?

A startup is claiming that they can detect terrorists purely through facial recognition. In this solo episode, Kyle explores the plausibility of these claims.
undefined
Jun 3, 2016 • 11min

[MINI] Goodhart's Law

Goodhart's law states that "When a measure becomes a target, it ceases to be a good measure". In this mini-episode we discuss how this affects SEO, call centers, and Scrum.
undefined
May 27, 2016 • 43min

Data Science at eHarmony

I'm joined this week by Jon Morra, director of data science at eHarmony to discuss a variety of ways in which machine learning and data science are being applied to help connect people for successful long term relationships. Interesting open source projects mentioned in the interview include Face-parts, a web service for detecting faces and extracting a robust set of fiducial markers (features) from the image, and Aloha, a Scala based machine learning library. You can learn more about these and other interesting projects at the eHarmony github page. In the wrap up, Jon mentioned the LA Machine Learning meetup which he runs. This is a great resource for LA residents separate and complementary to datascience.la groups, so consider signing up for all of the above and I hope to see you there in the future.
undefined
May 20, 2016 • 14min

[MINI] Stationarity and Differencing

Mystery shoppers and fruit cultivation help us discuss stationarity - a property of some time serieses that are invariant to time in several ways. Differencing is one approach that can often convert a non-stationary process into a stationary one. If you have a stationary process, you get the benefits of many known statistical properties that can enable you to do a significant amount of inferencing and prediction.
undefined
May 13, 2016 • 23min

Feather

I'm joined by Wes McKinney (@wesmckinn) and Hadley Wickham (@hadleywickham) on this episode to discuss their joint project Feather. Feather is a file format for storing data frames along with some metadata, to help with interoperability between languages. At the time of recording, libraries are available for R and Python, making it easy for data scientists working in these languages to quickly and effectively share datasets and collaborate.
undefined
May 6, 2016 • 15min

[MINI] Bargaining

Bargaining is the process of two (or more) parties attempting to agree on the price for a transaction.  Game theoretic approaches attempt to find two strategies from which neither party is motivated to deviate.  These strategies are said to be in equilibrium with one another.  The equilibriums available in bargaining depend on the the transaction mechanism and the information of the parties.  Discounting (how long parties are willing to wait) has a significant effect in this process.  This episode discusses some of the choices Kyle and Linh Da made in deciding what offer to make on a house.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode