Data Skeptic cover image

Data Skeptic

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Latest episodes

May 12, 2017 • 29min

Multi-Agent Diverse Generative Adversarial Networks

Despite the success of GANs in imaging, one of its major drawbacks is the problem of 'mode collapse,' where the generator learns to produce samples with extremely low variety. To address this issue, today's guests Arnab Ghosh and Viveka Kulharia proposed two different extensions. The first involves tweaking the generator's objective function with a diversity enforcing term that would assess similarities between the different samples generated by different generators. The second comprises modifying the discriminator objective function, pushing generations corresponding to different generators towards different identifiable modes.

May 5, 2017 • 10min

[MINI] Generative Adversarial Networks

GANs are an unsupervised learning method involving two neural networks iteratively competing. The discriminator is a typical learning system. It attempts to develop the ability to recognize members of a certain class, such as all photos which have birds in them. The generator attempts to create false examples which the discriminator incorrectly classifies. In successive training rounds, the networks examine each and play a mini-max game of trying to harm the performance of the other. In addition to being a useful way of training networks in the absence of a large body of labeled data, there are additional benefits. The discriminator may end up learning more about edge cases than it otherwise would be given typical examples. Also, the generator's false images can be novel and interesting on their own. The concept was first introduced in the paper Generative Adversarial Networks.

Apr 28, 2017 • 53min

Opinion Polls for Presidential Elections

Recently, we've seen opinion polls come under some skepticism. But is that skepticism truly justified? The recent Brexit referendum and US 2016 Presidential Election are examples where some claims the polls "got it wrong". This episode explores this idea.

Apr 21, 2017 • 26min

OpenHouse

No reliable, complete database cataloging home sales data at a transaction level is available for the average person to access. To a data scientist interesting in studying this data, our hands are complete tied. Opportunities like testing sociological theories, exploring economic impacts, study market forces, or simply research the value of an investment when buying a home are all blocked by the lack of easy access to this dataset. OpenHouse seeks to correct that by centralizing and standardizing all publicly available home sales transactional data. In this episode, we discuss the achievements of OpenHouse to date, and what plans exist for the future. Check out the OpenHouse gallery. I also encourage everyone to check out the project Zareen mentioned which was her Harry Potter word2vec webapp and Joy's project doing data visualization on Jawbone data. Guests Thanks again to @iamzareenf, @blueplastic, and @joytafty for coming on the show. Thanks to the numerous other volunteers who have helped with the project as well! Announcements and details If you're interested in getting involved in OpenHouse, check out the OpenHouse contributor's quickstart page. Kyle is giving a machine learning talk in Los Angeles on May 25th, 2017 at Zehr. Sponsor Thanks to our sponsor for this episode Periscope Data. The blog post demoing their maps option is on our blog titled Periscope Data Maps. To start a free trial of their dashboarding too, visit http://periscopedata.com/skeptics Kyle recently did a youtube video exploring the Data Skeptic podcast download numbers using Periscope Data. Check it out at https://youtu.be/aglpJrMp0M4. Supplemental music is Lee Rosevere's Let's Start at the Beginning.

Apr 14, 2017 • 11min

[MINI] GPU CPU

There's more than one type of computer processor. The central processing unit (CPU) is typically what one means when they say "processor". GPUs were introduced to be highly optimized for doing floating point computations in parallel. These types of operations were very useful for high end video games, but as it turns out, those same processors are extremely useful for machine learning. In this mini-episode we discuss why.

Apr 7, 2017 • 15min

[MINI] Backpropagation

Backpropagation is a common algorithm for training a neural network. It works by computing the gradient of each weight with respect to the overall error, and using stochastic gradient descent to iteratively fine tune the weights of the network. In this episode, we compare this concept to finding a location on a map, marble maze games, and golf.

Mar 31, 2017 • 32min

Data Science at Patreon

In this week's episode of Data Skeptic, host Kyle Polich talks with guest Maura Church, Patreon's data science manager. Patreon is a fast-growing crowdfunding platform that allows artists and creators of all kinds build their own subscription content service. The platform allows fans to become patrons of their favorite artists- an idea similar the Renaissance times, when musicians would rely on benefactors to become their patrons so they could make more art. At Patreon, Maura's data science team strives to provide creators with insight, information, and tools, so that creators can focus on what they do best-- making art. On the show, Maura talks about some of her projects with the data science team at Patreon. Among the several topics discussed during the episode include: optical music recognition (OMR) to translate musical scores to electronic format, network analysis to understand the connection between creators and patrons, growth forecasting and modeling in a new market, and churn modeling to determine predictors of long time support. A more detailed explanation of Patreon's A/B testing framework can be found here Other useful links to topics mentioned during the show: OMR research Patreon blog Patreon HQ blog Amanda Palmer Fran Meneses

Mar 24, 2017 • 16min

[MINI] Feed Forward Neural Networks

Feed Forward Neural Networks In a feed forward neural network, neurons cannot form a cycle. In this episode, we explore how such a network would be able to represent three common logical operators: OR, AND, and XOR. The XOR operation is the interesting case. Below are the truth tables that describe each of these functions. AND Truth Table Input 1 Input 2 Output 0 0 0 0 1 0 1 0 0 1 1 1 OR Truth Table Input 1 Input 2 Output 0 0 0 0 1 1 1 0 1 1 1 1 XOR Truth Table Input 1 Input 2 Output 0 0 0 0 1 1 1 0 1 1 1 0 The AND and OR functions should seem very intuitive. Exclusive or (XOR) if true if and only if exactly single input is 1. Could a neural network learn these mathematical functions? Let's consider the perceptron described below. First we see the visual representation, then the Activation function , followed by the formula for calculating the output. Can this perceptron learn the AND function? Sure. Let and What about OR? Yup. Let and An infinite number of possible solutions exist, I just picked values that hopefully seem intuitive. This is also a good example of why the bias term is important. Without it, the AND function could not be represented. How about XOR? No. It is not possible to represent XOR with a single layer. It requires two layers. The image below shows how it could be done with two laters. In the above example, the weights computed for the middle hidden node capture the essence of why this works. This node activates when recieving two positive inputs, thus contributing a heavy penalty to be summed by the output node. If a single input is 1, this node will not activate. Universal approximation theorem tells us that any continuous function can be tightly approximated using a neural network with only a single hidden layer and a finite number of neurons. With this in mind, a feed forward neural network should be adaquet for any applications. However, in practice, other network architectures and the allowance of more hidden layers are empirically motivated. Other types neural networks have less strict structal definitions. The various ways one might relax this constraint generate other classes of neural networks that often have interesting properties. We'll get into some of these in future mini-episodes. Check out our recent blog post on how we're using Periscope Data cohort charts. Thanks to Periscope Data for sponsoring this episode. More about them at periscopedata.com/skeptics

Mar 17, 2017 • 42min

Reinventing Sponsored Search Auctions

In this Data Skeptic episode, Kyle is joined by guest Ruggiero Cavallo to discuss his latest efforts to mitigate the problems presented in this new world of online advertising. Working with his collaborators, Ruggiero reconsiders the search ad allocation and pricing problems from the ground up and redesigns a search ad selling system. He discusses a mechanism that optimizes an entire page of ads globally based on efficiency-maximizing search allocation and a novel technical approach to computing prices.

Mar 10, 2017 • 15min

[MINI] The Perceptron

Today's episode overviews the perceptron algorithm. This rather simple approach is characterized by a few particular features. It updates its weights after seeing every example, rather than as a batch. It uses a step function as an activation function. It's only appropriate for linearly separable data, and it will converge to a solution if the data meets these criteria. Being a fairly simple algorithm, it can run very efficiently. Although we don't discuss it in this episode, multi-layer perceptron networks are what makes this technique most attractive.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner