

The Data Exchange with Ben Lorica
Ben Lorica
A series of informal conversations with thought leaders, researchers, practitioners, and writers on a wide range of topics in technology, science, and of course big data, data science, artificial intelligence, and related applications. Anchored by Ben Lorica (@BigData), the Data Exchange also features a roundup of the most important stories from the worlds of data, machine learning and AI. Detailed show notes for each episode can be found on https://thedataexchange.media/ The Data Exchange podcast is a production of Gradient Flow [https://gradientflow.com/].
Episodes
Mentioned books

Jun 25, 2020 • 34min
Designing machine learning models for both consumer and industrial applications
In this episode of the Data Exchange I speak with Christopher Nguyen, CEO of Arimo (a Panasonic company). I first met Christopher in the early days of Apache Spark, Arimo was one of the first companies to embrace Spark and make it a central component of their data platform. He was also an early proponent of exploring deep learning for enterprise applications. A serial entrepreneur, Christopher was also an Engineering Director at Google where he was responsible for Google Apps.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Jun 18, 2020 • 44min
Building open source developer tools for language applications
In this episode of the Data Exchange I speak with Matthew Honnibal, founder of Explosion AI, a startup focused on building developer tools for AI and natural language processing. Matthew and team are the creators of popular tools like spaCy (NLP), Thinc (lightweight deep learning library), and Prodigy (annotation and active learning). Our conversation focused on a range of topics including:spaCyThincExplosion AI and ProdigyDistributed computing with RayDetailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Jun 11, 2020 • 41min
Viewing machine learning and data science applications as sociotechnical systems
In this episode of the Data Exchange I speak with Chris Wiggins, Associate Professor at Columbia University, Chief Data Scientist at the New York Times, and co-founder of hackNY. He began his career in theoretical physics but he always had a strong interest in applying quantitative techniques to other disciplines. Early in his career he became interested in applications of machine learning to problems in biology and the health sciences.Our conversation focused on a range of topics including:How he shifted his focus from physics to machine learning and data science.Applications of reinforcement learning.“Data scientist” as a job title, and data science training programs.Ethics in machine learning and data science, including training the next generation of data scientists.A 2015 essay written by Michael Jordan and Tom Mitchell.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Jun 4, 2020 • 35min
Identifying and mitigating liabilities and risks associated with AI
In this episode of the Data Exchange I speak with Andrew Burt, Chief Legal Officer at Immuta and co-founder and Managing Partner of BNH.ai, a new law firm focused on AI compliance and related topics. As AI and machine learning become more widely deployed, lawyers and technologists need to collaborate more closely so they can identify and mitigate liabilities and risks associated with AI. BNH is the first law firm run by lawyers and technologists focused on helping companies identify and mitigate those risks.Our conversation focused on a range of topics including:Why a law firm is the right vehicle for helping companies manage and mitigate risks associated with AI and machine learning.The legal profession’s long history in managing risk and regulatory frameworks.Model governance.Incident response and recovery.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

May 28, 2020 • 40min
How machine learning is being used in quantitative finance
In this episode of the Data Exchange our special correspondent and editor Jenn Webb speaks with Arum Verma, Head of Quantitative Research Solutions at Bloomberg. My first job post-academia was as lead quant in a small hedge fund. Since then, I’ve followed the industry from afar and I’ve long been interested in the role of data and models in financial services. Arun and I discussed quantitative finance when we ran into each other at the O’Reilly AI conference in London last year. He was slated to give a talk on extracting trading signals from alternative data sets, an important subject among quants.Jenn and Arun discussed a range of topics including:The quantitative finance landscape.The challenges in identifying and using alternative data sources.Applications of machine learning in finance, specifically deep learning and reinforcement learning.New natural language models and their applications in finance.Model Explainability and Model Risk Management.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

May 21, 2020 • 35min
Understanding machine learning model governance
In this episode of the Data Exchange I speak with Harish Doddi, cofounder of Datatron, a startup focused on helping companies operationalize machine learning. Over the past two years, Harish has worked closely with enterprises to understand their needs in the areas of model operations and model governance. Last year Harish and I, along with David Talby, wrote two articles on these topics. In the first article, we described these emerging areas (“What are model governance and model operations?”), and in the second we listed lessons that ML engineers can draw from two highly regulated industries (“Managing machine learning in the enterprise: Lessons from banking and health care”).As machine learning becomes widely deployed, organizations will need to develop processes and tools to ensure that models behave as intended. This means having the right set of controls and validation steps in place.Our conversation focused on model governance and related topics:We discussed the three related areas of MLOps, Model Governance, Model Observability.I asked Harish to describe how model governance is perceived and practiced in different industries.We discussed real-world examples of model governance, and organizational and staffing considerations that come into play.CI/CD for machine learning.Key enterprise features for model governance solutions.Detailed show notes, including a full transcript, can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

May 14, 2020 • 34min
Improving performance and scalability of data science libraries
In this episode of the Data Exchange I speak with Wes McKinney, Director of Ursa Labs and an Apache Arrow PMC Member. Wes is the creator of pandas, one of the most widely used Python libraries for data science. He is also the author of the best-selling book, “Python for Data Analysis” – a book that has become essential reading for both aspiring and experienced data scientists.Our conversation focused on data science tools and other topics including:Two open source projects Wes has long been associated with: pandas and Apache Arrow.The need for a shared infrastructure for data science.Ursa Labs: its mission and structure.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

May 7, 2020 • 37min
Why TinyML will be huge
In this episode of the Data Exchange I speak with Pete Warden, Staff Research Engineer at Google. Pete is a prolific author and teacher, and he has made many important contributions across many open source software projects. To name just a couple of his projects: he put together the Data Science toolkit (open data sets and open-source tools for data science) and he assembled tools to help developers get started using deep learning, long before TensorFlow and PyTorch were available. Most recently, Pete has been focused on implementing machine learning in ultra-low power systems (TinyML).Our conversation focused on TinyML and other topics including:The early days of using deep learning for computer visionTensorFlow – Pete was part of the team at Google that originated TF.What is TinyML and why is going to be an important topic in the years ahead.Privacy and security in the context of TinyML.Pete’s new book and accompanying video series on YouTube, both designed to help developers get started building TinyML applications.Detailed show notes, including a full transcript, can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Apr 30, 2020 • 41min
An open source platform for training deep learning models
In this episode of the Data Exchange I speak with Evan Sparks, cofounder and CEO of Determined AI, a startup that recently open sourced a platform for training deep learning models. Many of the impressive results and applications of deep learning have happened at a handful of companies and research groups. As more companies use deep learning they are learning that infrastructure for training and transfer learning isn’t widely available.Our conversation focused on deep learning and other topics including:Their decision to open source the Determined Training Platform (DTP).Enterprise use cases and applications of deep learning, and why Evan thinks more companies will need a platform for training DL models.The components that come with the DTP: Distributed Training and Hyperparameter Tuning, Experiment Tracking and tools for collaboration and governance, Scheduler specialized for DL workflows, and more.Some examples of how teams have been using DTP.Detailed show notes, including a full transcript, can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Apr 23, 2020 • 44min
Algorithms that continually invent both problems and solutions
In this episode of the Data Exchange I speak with Kenneth Stanley, a Senior Research Manager at Uber AI and a Professor at UCF. Ken just announced that starting in June he is starting a new research group focused on open-endedness at OpenAI. He is a pioneer in the field of neuroevolution – a method for evolving and learning neural networks through evolutionary algorithms. Ken and his colleague, Joel Lehman, wrote one of my favorite books on AI aimed at a broad audience: Why Greatness Cannot Be Planned. In this episode we discuss his upcoming move to OpenAI, as well as his recent work on open-ended algorithms.Our conversation covered:Ken’s new position at OpenAI.The transition from being a longtime academic researcher to founding and helping lead an industrial research team (Uber AI Labs).Open-ended algorithms, specifically his work on POET (Paired Open-Ended Trailblazer) and Enhanced POET.Generative Teaching NetworksDetailed show notes can be found on The Data Exchange web site.Subscribe to The Data Exchange Newsletter.