Data Skeptic cover image

Data Skeptic

Latest episodes

undefined
Mar 13, 2015 • 33min

Oceanography and Data Science

Nicole Goebel joins us this week to share her experiences in oceanography studying phytoplankton and other aspects of the ocean and how data plays a role in that science.   We also discuss Thinkful where Nicole and I are both mentors for the Introduction to Data Science course. Last but not least, check out Nicole's blog Data Science Girl and the videos Kyle mentioned on her Youtube channel featuring one on the diversity of phytoplankton and how that changes in time and space.
undefined
Mar 6, 2015 • 18min

[MINI] Ordinary Least Squares Regression

The podcast explores Ordinary Least Squares regression, discussing the concept of regression and fitting models, making a YouTube video for a healthy cornbread recipe and discussing an ice cream recipe, controlling variables in an ice cream experiment, and exploring linear relationships in regression analysis.
undefined
Feb 27, 2015 • 17min

NYC Speed Camera Analysis with Tim Schmeier

New York State approved the use of automated speed cameras within a specific range of schools. Tim Schmeier did an analysis of publically available data related to these cameras as part of a project at the NYC Data Science Academy. Tim's work leverages several open data sets to ask the questions: are the speed cameras succeeding in their intended purpose of increasing public safety near schools? What he found using open data may surprise you. You can read Tim's write up titled Speed Cameras: Revenue or Public Safety? on the NYC Data Science Academy blog. His original write up, reproducible analysis, and figures are a great compliment to this episode. For his benevolent recommendation, Tim suggests listeners visit Maddie's Fund - a data driven charity devoted to helping achieve and sustain a no-kill pet nation. And for his self-serving recommendation, Tim Schmeier will very shortly be on the job market. If you, your employeer, or someone you know is looking for data science talent, you can reach time at his gmail account which is timothy.schmeier at gmail dot com.
undefined
Feb 20, 2015 • 14min

[MINI] k-means clustering

The podcast discusses the k-means clustering algorithm and its objective of grouping data points into clusters without guidance. It explores tracking animal movements and customer segmentation using k-means clustering. The concept of clusters and centroids is explained, along with classifying new data points. The chapter covers accuracy, precision, and trade-offs in k-means clustering. Lastly, it explores clusters, head positioning, data visualization, and the application of k-means clustering in the workplace.
undefined
Feb 13, 2015 • 39min

Shadow Profiles on Social Networks

Emre Sarigol joins me this week to discuss his paper Online Privacy as a Collective Phenomenon. This paper studies data collected from social networks and how the sharing behaviors of individuals can unintentionally reveal private information about other people, including those that have not even joined the social network! For the specific test discussed, the researchers were able to accurately predict the sexual orientation of individuals, even when this information was withheld during the training of their algorithm. The research produces a surprisingly accurate predictor of this private piece of information, and was constructed only with publically available data from myspace.com found on archive.org. As Emre points out, this is a small shadow of the potential information available to modern social networks. For example, users that install the Facebook app on their mobile phones are (perhaps unknowningly) sharing all their phone contacts. Should a social network like Facebook choose to do so, this information could be aggregated to assemble "shadow profiles" containing rich data on users who may not even have an account.
undefined
Feb 6, 2015 • 18min

[MINI] The Chi-Squared Test

The Chi-Squared test is a methodology for hypothesis testing. When one has categorical data, in the form of frequency counts or observations (e.g. Vegetarian, Pescetarian, and Omnivore), split into two or more categories (e.g. Male, Female), a question may arise such as "Are women more likely than men to be vegetarian?" or put more accurately, "Is any observed difference in the frequency with which women report being vegetarian differ in a statistically significant way from the frequency men report that?"
undefined
Jan 30, 2015 • 30min

Mapping Reddit Topics with Randy Olson

My quest this week is noteworthy a.i. researcher Randy Olson who joins me to share his work creating the Reddit World Map - a visualization that illuminates clusters in the reddit community based on user behavior. Randy's blog post on created the reddit world map is well complimented by a more detailed write up titled Navigating the massive world of reddit: using backbone networks to map user interests in social media. Last but not least, an interactive version of the results (which leverages Gephi) can be found here. For a benevolent recommendation, Randy suggetss people check out Seaborn - a python library for statistical data visualization. For a self serving recommendation, Randy recommends listeners visit the Data is beautiful subreddit where he's a moderator.
undefined
Jan 23, 2015 • 13min

[MINI] Partially Observable State Spaces

Exploring partially observable state spaces and their implications in chess, poker, and animal behavior. Understanding the concept of state models and their applications in analyzing dynamic systems. Tailoring content based on website visitors' behavior and needs. Exploring how probability distributions and Bayesian updating are used to represent uncertain states in data science.
undefined
Jan 16, 2015 • 28min

Easily Fooling Deep Neural Networks

My guest this week is Anh Nguyen, a PhD student at the University of Wyoming working in the Evolving AI lab. The episode discusses the paper Deep Neural Networks are Easily Fooled [pdf] by Anh Nguyen, Jason Yosinski, and Jeff Clune. It describes a process for creating images that a trained deep neural network will mis-classify. If you have a deep neural network that has been trained to recognize certain types of objects in images, these "fooling" images can be constructed in a way which the network will mis-classify them. To a human observer, these fooling images often have no resemblance whatsoever to the assigned label. Previous work had shown that some images which appear to be unrecognizable white noise images to us can fool a deep neural network. This paper extends the result showing abstract images of shapes and colors, many of which have form (just not the one the network thinks) can also trick the network.
undefined
Jan 9, 2015 • 11min

[MINI] Data Provenance

This episode introduces a high level discussion on the topic of Data Provenance, with more MINI episodes to follow to get into specific topics. Thanks to listener Sara L who wrote in to point out the Data Skeptic Podcast has focused alot about using data to be skeptical, but not necessarily being skeptical of data. Data Provenance is the concept of knowing the full origin of your dataset. Where did it come from? Who collected it? How as it collected? Does it combine independent sources or one singular source? What are the error bounds on the way it was measured? These are just some of the questions one should ask to understand their data. After all, if the antecedent of an argument is built on dubious grounds, the consequent of the argument is equally dubious. For a more technical discussion than what we get into in this mini epiosode, I recommend A Survey of Data Provenance Techniques by authors Simmhan, Plale, and Gannon.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode