DataFramed cover image

DataFramed

Latest episodes

undefined
Mar 4, 2019 • 1h 9min

#55 Getting Your First Data Science Job

This week, Hugo speaks with Chris Albon about getting your first data science job. Chris is a Data Scientist at Devoted Health, where he uses data science and machine learning to help fix America's healthcare system. Chris is also doing a lot of hiring at Devoted and that’s why he’s so excited today to talk about how to get your first data science job. You may know Chris as co-host of the podcast Partially Derivative, from his educational resources such as his blog and machine learning flashcards or as one of the funniest data scientists on Twitter.LINKS FROM THE SHOW DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on DataFramed?) FROM THE INTERVIEW Chris on TwitterChris's WebsiteDevoted WebsiteMachine Learning Flashcards (By Chris Albon)Machine Learning with Python Cookbook (By Chris Albon) FROM THE SEGMENTS Guidelines for A/B Testing (with Emily Robinson ~26:50) Guidelines for A/B Testing (By Emily Robinson)10 Guidelines for A/B Testing Slides (By Emily Robinson) Original music and sounds by The Sticks.
undefined
Feb 25, 2019 • 47min

#54 Women in Data Science

This week, Hugo speaks with Reshama Shaikh, about women in machine learning and data science, inclusivity and diversity more generally and how being intentional in what you do is essential. Reshama, a freelance data scientist and statistician, is also an organizer of the meetup groups Women in Machine Learning & Data Science (otherwise known  as WiMLDS) and PyLadies. She has organized WiMLDS for 4 years and is a Board Member. They’ll discuss her work at WiMLDS and what you can do to support and promote women and gender minorities in data science. They’ll also delve into why women are flourishing in the R community but lagging in Python and discuss more generally how NUMFOCUS thinks about diversity and inclusion, including their code of conduct. All this and more.LINKS FROM THE SHOW DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on DataFramed?) FROM THE INTERVIEW Reshama’s BlogReshama on TwitterList of Relevant Conferences (and Code of Conduct info)NYC PyLadies meetupCode of Conduct for NeurIPS and Other Stem OrganizationsNumFOCUS Diversity & Inclusion in Scientific Computing (DISC)NumFOCUS DISCOVER Cookbook (for inclusive events)fastai deep learning notes WiMLDS (Women in Machine Learning and Data Science) NYC WiMLDS meetupTo start a WiMLDS chapter: email info@wimlds.org and more info at our starter kit.WiMLDS WebsiteGlobal List of WiMLDS Meetup ChaptersWiMLDS Paris: They run their meetups in English, so knowledge of French is not required.   FROM THE SEGMENTS DataCamp User Stories (with David Sudolsky ~17:27 & ~31:50) Boldr Website Original music and sounds by The Sticks.
undefined
Feb 18, 2019 • 54min

#53 Data Science, Gambling and Bookmaking

This week, Hugo speaks with Marco Blume, Trading Director at Pinnacle Sports. Marco and Hugo will talk about the role of data science in large-scale bets and bookmaking, how Marco is training an army of data scientists and much more. At Pinnacle, Marco uses tight risk-management built on cutting-edge models to provide bets not only on sports but on questions such as who will be the next pope? Who will be the world hot dog eating champion, who will land on mars first and who will be on the iron throne at the end of game of thrones. They’ll discuss the relations between risk management and uncertainty, how great forecasters are necessarily good at updating their predictions in the light of new data and evidence, how you can model this using Bayesian inference and the future of biometric sensing in sports betting. And, as always, much, much more.LINKS FROM THE SHOW DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on DataFramed?) FROM THE INTERVIEW Pinnacle WebsiteTraining an army of new data scientists (Presentation by Marco Blume) FROM THE SEGMENTS Data Science Best Practices (with Ben Skrainka ~16:40) Python Debugging With Pdb (By Nathan Jennings)pdb Tutorial (Github)The Visual Python Debugger for Jupyter Notebooks You’ve Always Wanted (By David Taieb)Debugging with RStudio (By Jonathan McPherson)Basics of Debugging Statistical Distributions and their Stories (with Justin Bois at ~36:00) Justin's Website at CaltechProbability distributions and their stories (By Justin Bois) Original music and sounds by The Sticks.
undefined
Feb 11, 2019 • 1h 2min

#52 Data Science at the BBC

This week on DataFramed, the DataCamp podcast, Hugo speaks with Gabriel Straub, the Head of Data Science and Architecture at the BBC, where his role is to help make the organization more data informed and to make it easier for product teams to build data and machine learning powered products. They’ll be talking about data science and machine learning at the BBC and how they can impact content discoverability, understanding content, putting the right stuff in front of people, how Gabriel and his team develop broader data science & machine learning architecture to make sure best practices are adopted and what it means to apply machine learning in a sensible way. How does the BBC think about incorporating data science into its business, which has been around since 1922 and historically been at the forefront of technological innovation such as in radio and television? Listen to find out!LINKS FROM THE SHOW DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on DataFramed?) FROM THE INTERVIEW Gabriel Straub: It's bigger on the inside (Video)BBC datalab FROM THE SEGMENTS DataCamp User Stories (with Krittika Patil ~16:10 & ~38:12) Kespry (Drone Aerial Intelligence for Industry) Original music and sounds by The Sticks.
undefined
Feb 4, 2019 • 1h 2min

#51 Inclusivity and Data Science

This week Hugo speaks with Dr. Brandeis Marshall, about people of color and under-represented groups in data science. They’ll talk about the biggest barriers to entry for people of color, initiatives that currently exist and what we as a community can do to be as diverse and inclusive as possible. Brandeis is an Associate Professor of Computer Science at Spelman College. Her interdisciplinary research lies in the areas of information retrieval, data science, and social media. Other research includes the BlackTwitter Project, which blends data analytics, social impact and race as a lens to understanding cultural sentiments. Brandeis is involved in a number of projects, workshops, and organizations that support data literacy and understanding, share best data practices and broaden participation in data science. LINKS FROM THE SHOW DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on DataFramed?) FROM THE INTERVIEW Brandeis on TwitterThe BlackTwitter ProjectThe Impact of Live Tweeting on Social Movements (By Brandeis Marshall, Takeria Blunt, Tayloir Thompson)EvergreenLP: Using a social network as a learning platform (By Brandeis Marshall, Jaye Nias, Tayloir Thompson, Takeria Blunt)Journal of Computing Sciences in Colleges (By Brandeis Marshall)DSX (Data Science eXtension Faculty development and undergraduate instruction in data science) African American Women Computer Science PhDs500 Women ScientistsBlack in AIWomen in Machine Learning FROM THE SEGMENTS What Data Scientists Really Do (with Hugo Bowne-Anderson & Emily Robinson ~21:30 & ~41:40) What Data Scientists Really Do, According to 35 Data Scientists (Harvard Business Review article by Hugo Bowne-Anderson)What Data Scientists Really Do, According to 50 Data Scientists (Slides from a talk by Hugo Bowne-Anderson) Original music and sounds by The Sticks.
undefined
Nov 26, 2018 • 56min

#50 Weapons of Math Destruction

In episode 50, our Season 1, 2018 finale of DataFramed, the DataCamp podcast, Hugo speaks with Cathy O’Neil, data scientist, investigative journalist, consultant, algorithmic auditor and author of the critically acclaimed book Weapons of Math Destruction. Cathy and Hugo discuss the ingredients that make up weapons of math destruction, which are algorithms and models that are important in society, secret and harmful, from models that decide whether you keep your job, a credit card or insurance to algorithms that decide how we’re policed, sentenced to prison or given parole? Cathy and Hugo discuss the current lack of fairness in artificial intelligence, how societal biases are perpetuated by algorithms and how both transparency and auditability of algorithms will be necessary for a fairer future. What does this mean in practice? Tune in to find out. As Cathy says, “Fairness is a statistical concept. It's a notion that we need to understand at an aggregate level.” And, moreover, “data science doesn't just predict the future. It causes the future.”LINKS FROM THE SHOW DATAFRAMED SURVEY DataFramed Survey (take it so that we can make an even better podcast for you) DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on Season 2?) FROM THE INTERVIEW Cathy on TwitterCathy's Blog MathbabeWeapons of Math Destruction: How big data increases inequality and threatens democracy by Cathy O'NeilCathy's Opinion Column, Bloomberg Doing Data Science (By Cathy O'Neil and Rachel Schutt)Cathy O'Neil & Hanna Gunn's "Ethical Matrix" paper coming soon. FROM THE SEGMENTS Data Science Best Practices (with Heather Nolis ~20:30) Using docker to deploy an R plumber API (By Jonathan Nolis and Heather Nolis)Enterprise Web Services with Neural Networks Using R and TensorFlow (By Jonathan Nolis and Heather Nolis) Data Science Best Practices (with Ben Skrainka ~39:35) The Clean Coder Blog (By Robert C. Martin)James Shore’s blog post on Red, Green, RefactorJeff Knupp’s Python Unittesting tutorial (general unit tests in Python)John Myles White’s Intro to Unit Testing in R Original music and sounds by The Sticks.
undefined
Nov 19, 2018 • 58min

#49 Data Science Tool Building

Hugo speaks with Wes McKinney, creator of the pandas project for data analysis tools in Python and author of Python for Data Analysis, among many other things. Wes and Hugo talk about data science tool building, what it took to get pandas off the ground and how he approaches building “human interfaces to data” to make individuals more productive. On top of this, they’ll talk about the future of data science tooling, including the Apache arrow project and how it can facilitate this future, the importance of DataFrames that are portable between programming languages and building tools that facilitate data analysis work in the big data limit. Pandas initially arose from Wes noticing that people were nowhere near as productive as they could be due to lack of tooling & the projects he’s working on today, which they’ll discuss, arise from the same place and present a bold vision for the future.LINKS FROM THE SHOWDATAFRAMED SURVEY DataFramed Survey (take it so that we can make an even better podcast for you) DATAFRAMED GUEST SUGGESTIONS DataFramed Guest Suggestions (who do you want to hear on Season 2?) FROM THE INTERVIEW Wes on TwitterRoads and Bridges: The Unseen Labor Behind Our Digital Infrastructure by Nadia Eghbalpandas, an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.Ursa Labs FROM THE SEGMENTS Data Science Best Practices (with Ben Skrainka ~17:10) To Explain or To Predict? (By Galit Shmueli)Statistical Modeling: The Two Cultures (By Leo Breiman)The Book of Why (By Judea Pearl & Dana Mackenzie) Studies in Interpretability (with Peadar Coyle at ~39:00) Modelling Loss Curves in Insurance with RStan (By Mick Cooney)Lime: Explaining the predictions of any machine learning classifier Probabilistic Programming Primer Original music and sounds by The Sticks.
undefined
Nov 12, 2018 • 50min

#48 Managing Data Science Teams

In this episode of DataFramed, the DataCamp podcast, Hugo speaks with Angela Bassa about managing data science teams. Angela is Director of Data Science at iRobot, where she leads the team through development of machine learning algorithms, sentiment analysis, and anomaly detection processes. iRobot are the makers of consumer robots that we all know and love, like the Roomba, and the Braava which are, respectively, a robotic vacuum cleaner and a robotic mop. Angela will talk about how to get into data science management, the most important strategies to ensure that your data science team delivers value to the organization, how to hire data scientists and key points to consider as your data science team grows over time, in addition to the types of trade-offs you need to make as a data science manager and how you make the right ones. Along the way, you’ll see why a former marine biologist has the skills and ways of thinking to be a super data scientist at a company like iRobot and you’ll also see the importance of throwing data analysis parties.LINKS FROM THE SHOW FROM THE INTERVIEW Angela on TwitterHBR NewslettersiRobot CareersData Science Internship FROM THE SEGMENTS Correcting Data Science Misconceptions (w/ Heather Nolis ~18:45) Using docker to deploy an R plumber API (By Jonathon Nolis)Enterprise Web Services with Neural Networks Using R and TensorFlow (By Jonathan Nolis and Heather Nolis) Project of the Month (w/ David Venturi ~38:45) Rise and Fall of Programming Languages (R Project by David Robinson)Learn, Practice, Apply! (By Ramnath Vaidyanathan)Apply to create a DataCamp project!  Original music and sounds by The Sticks.
undefined
Nov 5, 2018 • 1h 3min

#47 Human-centered Design in Data Science

Hugo speaks with Peter Bull about the importance of human-centered design in data science. Peter is a data scientist for social good and co-founder of Driven Data, a company that brings cutting-edge practices in data science and crowdsourcing to some of the world's biggest social challenges and the organizations taking them on, including machine learning competitions for social good. They’ll speak about the practice of considering how humans interact with data and data products and how important it is to consider them while designing your data projects. They’ll see how human-centered design provides a robust and reproducible framework for involving the end-user all through the data work, illuminated by examples such as DrivenData’s work in financial services and Mobile Money in Tanzania. Along the way, they’ll discuss the role of empathy in data science, the increasingly important conversation around data ethics and much, much more.LINKS FROM THE SHOW FROM THE INTERVIEW Peter on TwitterDrivenDataDeon (Ethics Checklist)Cookiecutter Data ScienceIf you liked this interview, you might be interested in working with DrivenData! Currently, the team is looking for a software engineer who loves the idea of building Python applications for social impact. Apply Here! FROM THE SEGMENTS Probability Distributions and their Stories (with Justin Bois at ~24:00) Justin's Website at CaltechProbability distributions and their stories (By Justin Bois) Studies in Interpretability (with Peadar Coyle at ~38:10) Interpretable ML SymposiumHow will the GDPR impact machine learning? (By Andrew Burt)How to use Bayesian Stats in your daily job (Gates, Perry, Zorn (2002))Fairness in Machine Learning (By Moritz Hardt) Original music and sounds by The Sticks.
undefined
Oct 29, 2018 • 1h 2min

#46 AI in Healthcare, an Insider's Account

In this episode of DataFramed, a DataCamp podcast, Hugo speaks with Arnaub Chatterjee. Arnaub is a Senior Expert and Associate Partner in the Pharmaceutical and Medical Products group at McKinsey & Company. They’ll discuss cutting through the hype about artificial intelligence (AI) and machine learning (ML) in healthcare by looking at practical applications and how McKinsey & Company is helping the industry evolve. Tune in for an insider’s account into what has worked in healthcare, from ML models being used to predict nearly everything in clinical settings, to imaging analytics for disease diagnosis, to wound therapeutics. Will robots and AI replace disciplines such as radiology, ophthalmology, and dermatology? How have the moving parts of data science work evolved in healthcare? What does the future of data science, ML and AI in healthcare hold? Stick around to find out. LINKS FROM THE SHOW FROM THE INTERVIEW McKinsey Analytics on TwitterHot off the press article for HBR’s Future of Healthcare online forum (By Arnaub Chatterjee)Our latest piece on the promise & challenge of AI (By James Manyika and Jacques Bughin)Are robots coming for our jobs? (mckinsey.com)Analytics Careers page (mckinsey.com)How we help clients in healthcare analytics (mckinsey.com)AI analysis of 400+ use cases, including ones in healthcare (By Michael Chui et al. mckinsey.com) FROM THE SEGMENTS Machines that Multi-task (with Manny Moss) Part 1 at ~21:05 Responsible AI in Consumer EnterpriseHilary Mason, DJ Patil and Mike Loukides on Data EthicsEthicalOS Tookit Part 2 at ~40:00 21 Definitions of Fairness Tutorial from FAT* (Arvind Naranayan)Kate Crawford's keynote address "The Trouble with Bias" from NIPS 2017The (im)possibility of Fairness (Sorelle et al. arXiv.org)Learning from disparate data sources (Li Y et al. PubMed.gov)Distributed Multi-task Learning (Liyang Xie et al. KDD.org)The Cost of Fairness in Binary Classification (Aditya Krishna Menon et al. proceedings.mlr.press) Original music and sounds by The Sticks.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode