Data Skeptic cover image

Data Skeptic

Latest episodes

undefined
Aug 28, 2015 • 53min

ContentMine

ContentMine is a project which provides the tools and workflow to convert scientific literature into machine readable and machine interpretable data in order to facilitate better and more effective access to the accumulated knowledge of human kind. The program's founder Peter Murray-Rust joins us this week to discuss ContentMine. Our discussion covers the project, the scientific publication process, copywrite, and several other interesting topics.
undefined
Aug 21, 2015 • 13min

[MINI] Structured and Unstructured Data

Today's mini-episode explains the distinction between structured and unstructured data, and debates which of these categories best describe recipes.
undefined
Aug 14, 2015 • 25min

Measuring the Influence of Fashion Designers

Yusan Lin shares her research on using data science to explore the fashion industry in this episode. She has applied techniques from data mining, natural language processing, and social network analysis to explore who are the innovators in the fashion world and how their influence effects other designers. If you found this episode interesting and would like to read more, Yusan's papers Text-Generated Fashion Influence Model: An Empirical Study on Style.com and The Hidden Influence Network in the Fashion Industry are worth reading.
undefined
Aug 7, 2015 • 8min

[MINI] PageRank

PageRank is the algorithm most famous for being one of the original innovations that made Google stand out as a search engine. It was defined in the classic paper The Anatomy of a Large-Scale Hypertextual Web Search Engine by Sergey Brin and Larry Page. While this algorithm clearly impacted web searching, it has also been useful in a variety of other applications. This episode presents a high level description of this algorithm and how it might apply when trying to establish who writes the most influencial academic papers.
undefined
Jul 29, 2015 • 41min

Data Science at Work in LA County

In this episode, Benjamin Uminsky enlightens us about some of the ways the Los Angeles County Registrar-Recorder/County Clerk leverages data science and analysis to help be more effective and efficient with the services and expectations they provide citizens. Our topics range from forecasting to predicting the likelihood that people will volunteer to be poll workers. Benjamin recently spoke at Big Data Day LA. Videos have not yet been posted, but you can see the slides from his talk Data Mining Forecasting and BI at the RRCC if this episode has left you hungry to learn more. During the show, Benjamin encouraged any Los Angeles residents who have some time to serve their community consider becoming a pollworker.
undefined
Jul 24, 2015 • 9min

[MINI] k-Nearest Neighbors

This episode explores the k-nearest neighbors algorithm which is an unsupervised, non-parametric method that can be used for both classification and regression. The basica concept is that it leverages some distance function on your dataset to find the $k$ closests other observations of the dataset and averaging them to impute an unknown value or unlabelled datapoint.
undefined
Jul 17, 2015 • 1h 25min

Crypto

How do people think rationally about small probability events? What is the optimal statistical process by which one can update their beliefs in light of new evidence? This episode of Data Skeptic explores questions like this as Kyle consults a cast of previous guests and experts to try and answer the question "What is the probability, however small, that Bigfoot is real?"
undefined
Jul 10, 2015 • 13min

[MINI] MapReduce

This mini-episode is a high level explanation of the basic idea behind MapReduce, which is a fundamental concept in big data. The origin of the idea comes from a Google paper titled MapReduce: Simplified Data Processing on Large Clusters. This episode makes an analogy to tabulating paper voting ballets as a means of helping to explain how and why MapReduce is an important concept.
undefined
Jul 3, 2015 • 35min

Genetically Engineered Food and Trends in Herbicide Usage

The Credible Hulk joins me in this episode to discuss a recent blog post he wrote about glyphosate and the data about how it's introduction changed the historical usage trends of other herbicides. Links to all the sources and references can be found in the blog post. In this discussion, we also mention the food babe and Last Thursdayism which may be worth some further reading. Kyle also mentioned the list of ingredients or chemical composition of a banana. Credible Hulk mentioned the Mommy PhD facebook page. An interesting article about Mommy PhD can be found here. Lastly, if you enjoyed the show, please "Like" the Credible Hulk facebook group.
undefined
Jun 26, 2015 • 11min

[MINI] The Curse of Dimensionality

This podcast explores the curse of dimensionality in machine learning, using the examples of gas station selection and buying a home. It discusses the challenges of high-dimensional data and the use of dimensionality reduction. The hosts also share their personal preferences in home buying.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app