DataFramed cover image

DataFramed

Latest episodes

undefined
Oct 22, 2018 • 1h 6min

#45 Decision Intelligence and Data Science

In this episode of DataFramed, Hugo speaks with Cassie Kozyrkov, Chief Decision Scientist at Google Cloud. Cassie and Hugo will be talking about data science, decision making and decision intelligence, which Cassie thinks of as data science plus plus, augmented with the social and managerial sciences. They’ll talk about the different and evolving models for how the fruits of data science work can be used to inform robust decision making, along with pros and cons of all the models for embedding data scientists in organizations relative to the decision function. They’ll tackle head on why so many organizations fail at using data to robustly inform decision making, along with best practices for working with data, such as not verifying your results on the data that inspired your models. As Cassie says, “Split your damn data”.Links from the show FROM THE INTERVIEW Cassie on Twitter Is data science a bubble? (By Cassie Kozyrkov, Hackernoon)Incompetence, delegation, and population (By Cassie Kozyrkov, Hackernoon)Populations — You’re doing it wrong (By Cassie Kozyrkov, Hackernoon)What on earth is data science? (By Cassie Kozyrkov, Hackernoon) FROM THE SEGMENTS Probability Distributions and their Stories (with Justin Bois at ~19:45) Justin's Website at CaltechProbability distributions and their stories (By Justin Bois) Machines that Multi-Task (with Friederike Schüür of Fast Forward Labs ~43:45) Sebastian’s Ruder’s Overview of Multi-Task Learning in Deep Neural NetworksMulti-Task Learning for NLP, also by Sebastian RuderGANs for Fake Celebrity Images (Karras et al, Nvidia)Adversarial Multi-Task Learning for Text Classification (Liu et al., arXiv.org) Original music and sounds by The Sticks.
undefined
Oct 15, 2018 • 1h 5min

#44 Project Jupyter and Interactive Computing

In this episode of DataFramed, Hugo speaks with Brian Granger, co-founder and co-lead of Project Jupyter, physicist and co-creator of the Altair package for statistical visualization in Python. They’ll speak about data science, interactive computing, open source software and Project Jupyter. With over 2.5 million public Jupyter notebooks on github alone, Project Jupyter is a force to be reckoned with. What is interactive computing and why is it important for data science work? What are all the the moving parts of the Jupyter ecosystem, from notebooks to JupyterLab to JupyterHub and binder and why are they so relevant as more and more institutions adopt open source software for interactive computing and data science? From Netflix running around 100,000 Jupyter notebook batch jobs a day to LIGO’s Nobel prize winning discovery of gravitational waves publishing all their results reproducibly using Notebooks, Project Jupyter is everywhere.  Links from the show  FROM THE INTERVIEW Brian on Twitter Project JupyterBeyond Interactive: Notebook Innovation at Netflix (Ufford, Pacer, Seal, Kelley, Netflix Tech Blog)Gravitational Wave Open Science Center (Tutorials)JupyterCon YouTube Playlistjupyterstream Github Repository FROM THE SEGMENTS Machines that Multi-Task (with Friederike Schüür of Fast Forward Labs)Part 1 at ~24:40 Brief Introduction to Multi-Task Learning (By Friederike Schüür)Overview of Multi-Task Learning Use Cases (By Manny Moss)Multi-Task Learning for the Segmentation of Building Footprints (Bischke et al., arXiv.org)Multi-Task as Question Answering (McCann et al., arXiv.org)The Salesforce Natural Language Decathlon: A Multitask Challenge for NLP  Part 2 at ~44:00 Rich Caruana’s Awesome Overview of Multi-Task Learning and Why It WorksSebastian’s Ruder’s Overview of Multi-Task Learning in Deep Neural NetworksMassively Multi-Task Network for Drug Discovery, 259 Tasks (!) (Ramsundar et al. arXiv.org)Brief Overview of Multi-Task Learning with Video of Newsie, the Prototype (By Friederike Schüür)  Original music and sounds by The Sticks.
undefined
Oct 8, 2018 • 1h 5min

#43 Election Forecasting and Polling

Hugo speaks with Andrew Gelman about statistics, data science, polling, and election forecasting. Andy is a professor of statistics and political science and director of the Applied Statistics Center at Columbia University and this week we’ll be talking the ins and outs of general polling and election forecasting, the biggest challenges in gauging public opinion, the ever-present challenge of getting representative samples in order to model the world and the types of corrections statisticians can and do perform. "Chatting with Andy was an absolute delight and I cannot wait to share it with you!"-Hugo   Links from the show  FROM THE INTERVIEW Andrew's Blog Andrew on Twitter We Need to Move Beyond Election-Focused Polling (Gelman and Rothschild, Slate)We Gave Four Good Pollsters the Same Raw Data. They Had Four Different Results (Cohn, The New York Times).19 things we learned from the 2016 election (Gelman and Azari, Science, 2017)The best books on How Americans Vote (Gelman, Five Books)The best books on Statistics (Gelman, Five Books)Andrew's Research  FROM THE SEGMENTS Statistical Lesson of the Week (with Emily Robinson at ~13:30) The five Cs (Loukides, Mason, and Patil, O'Reilly) Data Science Best Practices (with Ben Skrainka~40:40) Oberkampf & Roy’s Verification and Validation in Scientific Computing provides a thorough yet very readable treatment A comprehensive framework for verification, validation, and uncertainty quantification in scientific computing (Roy and Oberkampf, Science Direct)  Original music and sounds by The Sticks.
undefined
Oct 1, 2018 • 51min

#42 Full Stack Data Science

Hugo speaks with Vicki Boykis about what full-stack end-to-end data science actually is, how it works in a consulting setting across various industries and why it’s so important in developing modern data-driven solutions to business problems. Vicki is a full-stack data scientist and senior manager at CapTech Consulting, working on projects in machine learning and data engineering. They'll also discuss the increasing adoption of data science in the cloud technologies and associated pitfalls, along with how to equip businesses with the skills to maintain the data products you developed for them. All this and more: Hugo is pumped! Links from the show FROM THE INTERVIEW Vicki's Tech Blog Vicki on Twitter CapTech Consulting Vicki's Tweet about Programming Building a Twitter art bot with Python, AWS, and socialist realism art FROM THE SEGMENTS Data Science Best Practices (with Ben Skrainka~15:00) Cross-industry standard process for data mining Fundamentals of Machine Learning for Predictive Data Analytics Statistical Lesson of the Week (with Emily Robinson at ~32:05) Sex Bias in Graduate Admissions: Data from Berkeley (Bickel et al., Science, 1975) Time Series Analysis Tutorial with Python Original music and sounds by The Sticks.
undefined
Sep 24, 2018 • 59min

#41 Uncertainty in Data Science

Hugo speaks with Allen Downey about uncertainty in data science. Allen is a professor of Computer Science at Olin College and the author of a series of free, open-source textbooks related to software and data science. Allen and Hugo speak about uncertainty in data science and how we, as humans, are not always good at thinking about uncertainty, which we need be to in such an uncertain world. Should we have been surprised at the outcome of the 2016 election? What approaches can we, as a data reporting community, take to communicate around uncertainty better in the future? From election forecasting to health and safety, thinking about uncertainty and using data & data-oriented tools to communicate around uncertainty are essential. Links from the show FROM THE INTERVIEW Data Science Data Optimism Allen's Twitter List of cognitive biases Why are we so surprised? (Allen's Blog) Probably Overthinking It (Allen Downey's Blog) Think Stats (Allen's Book) There is only one test! (Allen's Blog) FROM THE SEGMENT Statistical Distributions and their Stories (with Justin Bois at ~27:00) Justin's Website at Caltech Probability distributions and their stories LeBron James Field Goals Original music and sounds by The Sticks.
undefined
Sep 17, 2018 • 1h 1min

#40 Becoming a Data Scientist

Hugo speaks with Renee Teate about the many paths to becoming a data scientist. Renee is a Data Scientist at higher ed analytics start-up HelioCampus, and creator and host of the Becoming a Data Scientist Podcast. In addition to discussing the many possible ways to become becoming a data scientist, they will discuss the common data scientist profiles and how to figure out which ones may be a fit for you. They’ll also dive into the fact that you need to figure out both where you are in terms of skills and knowledge and where you want to go in terms of your career. Renee has a bunch of great suggestions for aspiring data scientists and also flags several important pitfalls and warnings. On top of this, they'll dive into how much statistics, linear algebra and calculus you need to know in order to become an effective data scientist and/or data analyst. Links from the show FROM THE INTERVIEW Becoming a Data Scientist (Renée's Blog) Renée's Twitter Data Sci Guide (Data Science Learning Directory) FROM THE SEGMENTS Statistical Distributions and their Stories (with Justin Bois at ~19:20) Justin's Website at Caltech Probability distributions and their stories Programming Topic of the Week (with Emily Robinson at ~43:20) Categorical Data in the Tidyverse, a DataCamp Course taught by Emily Robinson. R for Data Science Book by Hadley Wickham (Factors Chapter) Inference for Categorical Data, a DataCamp Course taught by Andrew Bray. stringsAsFactors: An unauthorized biography (Roger Peng, July 24, 2015) Wrangling categorical data in R (Amelia McNamara & Nicholas J Horton, August 30, 2017) Original music and sounds by The Sticks.
undefined
Sep 10, 2018 • 60min

#39 Data Science at Stitch Fix

Hugo speaks with Eric Colson, Chief Algorithms Officer at Stitch Fix, an online personal styling service reinventing the shopping experience by delivering one-to-one personalization to their clients through the combination of data science and human judgment. Eric is responsible for the creation of dozens of algorithms at Stitch Fix that are pervasive to nearly every function of the company, from merchandise, inventory, and marketing to forecasting and demand, operations, and the styling recommender system. Join for all of this and more. Links from the show FROM THE INTERVIEW Stitch Fix Algorithm Tour Warehouse Maps, Movie Recommendation, Structural Biology Advice for Data Scientists on where to work More Human Humans: how our work-life can be improved by ceding tasks to machines. Learning from Textual Feedback (natural Language processing) Deep Style: Teaching machines about style from images Hybrid Designs You Can’t Make this stuff up … or can you? The Blissful Ignorance of the Narrative Fallacy FROM THE SEGMENTS Blog Post of the Week (with Emily Robinson) Doing Good Data Science by Mike Loukides, Hilary Mason and DJ Patil Original music and sounds by The Sticks.
undefined
Sep 3, 2018 • 51min

#38 Data Products, Dashboards and Rapid Prototyping

Meet Tanya Cashorali, a founding partner of TCB Analytics, a Boston-based data consultancy. Tanya started her career in bioinformatics and has applied her experience to other industries such as healthcare, finance, retail, and sports. We’ll be talking about what it means to be a data consultant, the wide range of industries that Tanya works in, the impact of data products in her work and the importance of rapid prototyping and getting MVPs or minimum viable products out the door. How does Tanya balance the trade-off between rapid prototyping and building fully mature data products? How does this play out in particular cases in the healthcare and telecommunications spaces? How has her ability to do this evolved as a function of open source software development? We’ll also dive into how general data literacy has evolved, how it can help decision making in business more generally, the data science skills gap and how many data science hiring processes are broken and how to fix them.
undefined
Aug 27, 2018 • 60min

#37 Data Science and Insurance

Hugo speaks with JD Long, VP of risk management for Renaissance reinsurance, about applications of data science techniques to the omnipresent worlds of insurance, reinsurance, risk management and uncertainty. What are the biggest challenges in insurance and reinsurance that data science can impact? How does JD go about building risk representations of every deal? How can thinking in a distributed fashion allow us to think about risk and uncertainty? What is the role of empathy in data science?
undefined
Aug 20, 2018 • 56min

#36 Data Science and Ecology

Hugo speaks with Christie Bahlai, Assistant Professor at Kent State University, about data science, ecology, and the adoption of techniques such as machine learning in academic research. What are the biggest challenges in ecology that data science can help to solve? What does the intersection of open science and data science look like? In scientific research, what is happening at the interface between data science & machine learning methods, which are pattern-based, and traditional research methods, which are classically hypothesis driven? Is there a paradigm shift occurring here? Listen to find out! Links from the show The Bahlai Lab of applied quantitative ecology Christie Bahlai on twitter Hugo's article on What Data Scientists Really Do in Harvard Business Review Hugo's webinar on What Managers Need To Know About Machine Learning

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode