Data Skeptic

Kyle Polich
undefined
Jan 20, 2017 • 34min

Studying Competition and Gender Through Chess

Prior work has shown that people's response to competition is in part predicted by their gender. Understanding why and when this occurs is important in areas such as labor market outcomes. A well structured study is challenging due to numerous confounding factors. Peter Backus and his colleagues have identified competitive chess as an ideal arena to study the topic. Find out why and what conclusions they reached. Our discussion centers around Gender, Competition and Performance: Evidence from Real Tournaments from Backus, Cubel, Guid, Sanchez-Pages, and Mañas. A summary of their paper can also be found here.  
undefined
Jan 13, 2017 • 16min

[MINI] Dropout

Deep learning can be prone to overfit a given problem. This is especially frustrating given how much time and computational resources are often required to converge. One technique for fighting overfitting is to use dropout. Dropout is the method of randomly selecting some neurons in one's network to set to zero during iterations of learning. The core idea is that each particular input in a given layer is not always available and therefore not a signal that can be relied on too heavily.  
undefined
Jan 6, 2017 • 49min

The Police Data and the Data Driven Justice Initiatives

In this episode I speak with Clarence Wardell and Kelly Jin about their mutual service as part of the White House's Police Data Initiative and Data Driven Justice Initiative respectively. The Police Data Initiative was organized to use open data to increase transparency and community trust as well as to help police agencies use data for internal accountability. The PDI emerged from recommendations made by the Task Force on 21st Century Policing. The Data Driven Justice Initiative was organized to help city, county, and state governments use data-driven strategies to help low-level offenders with mental illness get directed to the right services rather than into the criminal justice system.
undefined
Dec 30, 2016 • 35min

The Library Problem

We close out 2016 with a discussion of a basic interview question which might get asked when applying for a data science job. Specifically, how a library might build a model to predict if a book will be returned late or not.
undefined
Dec 23, 2016 • 40min

2016 Holiday Special

Today's episode is a reading of Isaac Asimov's Franchise.  As mentioned on the show, this is just a work of fiction to be enjoyed and not in any way some obfuscated political statement.  Enjoy, and happy holidays!
undefined
Dec 16, 2016 • 17min

[MINI] Entropy

Classically, entropy is a measure of disorder in a system. From a statistical perspective, it is more useful to say it's a measure of the unpredictability of the system. In this episode we discuss how information reduces the entropy in deciding whether or not Yoshi the parrot will like a new chew toy. A few other everyday examples help us examine why entropy is a nice metric for constructing a decision tree.
undefined
Dec 9, 2016 • 42min

MS Connect Conference

Cloud services are now ubiquitous in data science and more broadly in technology as well. This week, I speak to Mark Souza, Tobias Ternström, and Corey Sanders about various aspects of data at scale. We discuss the embedding of R into SQLServer, SQLServer on linux, open source, and a few other cloud topics.
undefined
Dec 2, 2016 • 34min

Causal Impact

Today's episode is all about Causal Impact, a technique for estimating the impact of a particular event on a time series. We talk to William Martin about his research into the impact releases have on app and we also chat with Karen Blakemore about a project she helped us build to explore the impact of a Saturday Night Live appearance on a musician's career. Martin's work culminated in a paper Causal Impact for App Store Analysis. A shorter summary version can be found here. His company helping app developers do this sort of analysis can be found at crestweb.cs.ucl.ac.uk/appredict/.
undefined
Nov 25, 2016 • 11min

[MINI] The Bootstrap

The Bootstrap is a method of resampling a dataset to possibly refine it's accuracy and produce useful metrics on the result. The bootstrap is a useful statistical technique and is leveraged in Bagging (bootstrap aggregation) algorithms such as Random Forest. We discuss this technique related to polling and surveys.
undefined
Nov 18, 2016 • 16min

[MINI] Gini Coefficients

Exploring the Gini Coefficient and its application to measure income inequality. Factors influencing travel destination choices and using machine learning to predict preferences. Building decision trees for predicting travel preferences. Picking the first feature to use in a decision-making model.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app