The Data Exchange with Ben Lorica

Ben Lorica
undefined
Aug 19, 2021 • 50min

An oscilloscope for deep learning

This week’s guest is Charles Martin, independent researcher and founder of Calculation Consulting, a boutique consultancy focused on data science and machine learning. Along with Michael Mahoney and Serena Peng, Charles is co-author of a recent Nature paper on new methods for evaluating and tuning deep learning models (“Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data”).Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
undefined
Aug 12, 2021 • 37min

What’s new in data engineering

This week our special correspondent and editor Jenn Webb organized a mini-panel composed of myself and Jesse Anderson, Managing Director at the Big Data Institute. Jesse is the author of a recent book entitled “Data Teams: A Unified Management Model for Successful Data-Focused Teams”. This conversation was focused on key areas in data engineering.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
undefined
Aug 5, 2021 • 50min

The evolution of the data science role and of data science tools

This week our managing editor Jenn Webb and I speak with Sean Taylor, Data Science Manager at Lyft. Sean was previously a research scientist and manager at Facebook where he was instrumental in the creation and release of Prophet, a very popular open source library for time-series forecasting.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
undefined
Jul 29, 2021 • 52min

Data Augmentation in Natural Language Processing

This week’s guests are Steven Feng, Graduate Student and Ed  Hovy, Research Professor, both from the Language Technologies Institute of Carnegie Mellon University. We discussed their recent survey paper on Data Augmentation Approaches in NLP (GitHub), an active field of research on techniques for increasing the diversity of training examples without explicitly collecting new data. One key reason why such strategies are important is that augmented data can act as a regularizer to reduce overfitting when training models.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
undefined
Jul 22, 2021 • 43min

Storage Technologies for a Multi-cloud World

This week’s guest is Brad King, CTO of Scality, a company that builds software-defined file and object storage systems for hybrid & multi-cloud settings. Storage and compute are the basic building blocks of (cloud) computing platforms and this episode highlights all the important considerations and recent innovations in storage technologies that data engineers, architects, and machine learning professionals need to know.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
undefined
Jul 15, 2021 • 49min

Building a next-generation dataflow orchestration and automation system

In this episode, our managing editor Jenn Webb and I speak with Chris White, CTO of Prefect, a startup building tools to help companies build, monitor, and manage dataflows. Prefect originated from lessons Chris and his co-founder learned while they were at Capital One, where they were early users and contributors to related projects like Apache Airflow.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
undefined
Jul 8, 2021 • 44min

Building a flexible, intuitive, and fast forecasting library

This week’s guests are Reza Hosseini, Staff Software Engineer, and Albert Chen, Staff Data Scientist, both at Linkedin. Reza and Albert are part of the team behind the new open source library Greykite, a flexible and fast library for time-series forecasting.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
undefined
Jul 1, 2021 • 44min

Neural Models for Tabular Data

This week’s guest is Sercan Arik, Research Scientist at Google Cloud AI. Sercan and his collaborators recently published a paper on TabNet, a deep neural network architecture for tabular data. It uses sequential attention to select features, is explainable, and based on tests Sarjan and team have done spanning many domains, TabNet outperforms or is on par with other models (e.g., XGBoost) on classification and regression problems.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
undefined
Jun 24, 2021 • 51min

Training and Sharing Large Language Models

This week’s guest is Connor Leahy, AI Researcher at Aleph Alpha GmbH, and founding member of EleutherAI, (pronnounced “ee-luther”) a collective of researchers and engineers building resources and models for researchers who work on natural language models. As NLP research becomes more computationally demanding and data intensive, there is a need for researchers to work together to develop tools and resources for the broader community. While relatively new, EleutherAI has already released a models and data that many researchers are benefitting from.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
undefined
Jun 17, 2021 • 59min

Questioning the Efficacy of Neural Recommendation Systems

This week’s guests are leading researchers in recommendation systems: Paolo Cremonesi is Professor of Computer Science and Maurizio Ferrari Dacrema is a Postdoc at Politecnico di Milano, where they are both part of the RecSys research group. Paolo is also the Reproducibility co-chair for the upcoming RecSys Conference.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app