

The Data Exchange with Ben Lorica
Ben Lorica
A series of informal conversations with thought leaders, researchers, practitioners, and writers on a wide range of topics in technology, science, and of course big data, data science, artificial intelligence, and related applications. Anchored by Ben Lorica (@BigData), the Data Exchange also features a roundup of the most important stories from the worlds of data, machine learning and AI. Detailed show notes for each episode can be found on https://thedataexchange.media/ The Data Exchange podcast is a production of Gradient Flow [https://gradientflow.com/].
Episodes
Mentioned books

Nov 12, 2020 • 30min
Testing Natural Language Models
In this episode of the Data Exchange I speak with Marco Ribeiro, Senior Researcher at Microsoft Research, and lead author of the award-winning paper ”Beyond Accuracy: Behavioral Testing of NLP models with CheckList”. As machine learning gains importance across many application domains and industries, there is a growing need to formalize how ML models get built, deployed, and used. MLOps is an emerging set of practices focused on productionizing the machine learning lifecycle, that draws ideas from CI/CD. But even before we talk about deploying a model to production, how do we inject more rigor into the model development process?Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Nov 5, 2020 • 33min
Detecting Fake News
Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.In this episode of the Data Exchange I speak with Xinyi Zhou, a graduate student in Computer and Information Science at Syracuse University. Xinyi and her advisor (Reza Zafarani) recently wrote a comprehensive survey paper entitled “A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities”. They set out to organize the many different methods and perspectives used to detect fake news. Their paper is a great resource for anyone wanting to understand the strengths and limitations of various state-of-the-art techniques, and a feel for where the research community might be headed in the near future.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Oct 29, 2020 • 43min
The Computational Limits of Deep Learning
Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Neil Thompson, Research Scientist at Computer Science and Artificial Intelligence Lab (CSAIL) and the Initiative on the Digital Economy, both at MIT. I wanted Neil on the podcast to discuss a recent paper he co-wrote entitled “The Computational Limits of Deep Learning” (summary version here). This paper provides estimates of the amount of computation, economic costs, and environmental impact that come with increasingly large and more accurate deep learning models. Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Oct 22, 2020 • 47min
Making deep learning accessible
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Piero Molino, creator of Ludwig, a toolbox that allows users to train and test deep learning models through a declarative interface. Piero created Ludwig while serving as a Senior Research Scientist at Uber AI. He originally created Ludwig for his personal use and it slowly garnered users within Uber. By the time it was open sourced in early 2019, the project immediately found a receptive audience in the conferences I was chairing at the time.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Oct 15, 2020 • 50min
Building and deploying knowledge graphs
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Mayank Kejriwal, a Research Assistant Professor in the Department of Industrial and Systems Engineering, and a Research Lead at the USC Information Sciences Institute. The focus of our conversation is knowledge graphs, a collection of linked entities (objects, events, concepts) that is used in many AI applications. For example, Google uses a knowledge graph to enhance its search engine results with infoboxes that appear in some search results. Other areas where knowledge graphs are common include e-commerce, healthcare, and financial services.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Oct 8, 2020 • 37min
Financial Time Series Forecasting with Deep Learning
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Murat Özbayoğlu, Chair of Artificial Intelligence Engineering at TOBB University of Economics and Technology in Ankara, Turkey. I wanted Murat on to discuss two survey papers he and his colleagues wrote on the use of deep learning in finance.I’ve long been fascinated with finance and trading. My first job after I left academia was as the lead quant in a hedge fund, and ever since, I’ve tried to stay abreast of what tools and techniques quants and data scientists in finance are using. Forecasting in this setting usually means price prediction or price movement (trend) prediction. Output of forecasting models are used to inform investment decisions. What makes finance particularly challenging is that many people are using the same underlying data (time series of prices/values), and thus as Murat notes, many firms use alternative data sources (such as text) as potential sources of additional signal.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Oct 1, 2020 • 50min
A programming language for scientific machine learning and differentiable programming
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Viral Shah, co-founder and CEO, Julia Computing. Along with his Julia language co-creators, Viral was awarded the 2019 Wilkinson prize, for outstanding contributions in the field of numerical software. I first tweeted about Julia at the beginning of March 2012 after seeing Jeff Bezanson give a talk in Stanford. I’ve dabbled with it here and there, but have never used it for a major project. Over the past few years, Julia continued to add packages at a steady pace and the package manager is really quite impressive and solid. We spent much of the podcast discussing the state of Julia, Julia 1.5, and the Julia ecosystem and community.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Sep 24, 2020 • 33min
Using machine learning to modernize medical triage and monitoring systems
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Kira Radinsky, Chairwoman & Chief Technology Officer at Diagnostic Robotics, a startup using AI to build a medical-grade triage and clinical-predictions platform. She is also a visiting Professor at Technion – Israel Institute of Technology. Kira has extensive experience using data science and machine learning in a variety of settings, and she was one of the pioneers in using alternative data sources to augment forecasting models. Her earlier work includes models to predict social unrest as well as disease outbreaks. The global pandemic has increased the need for experts in medical data mining, a field where Kira has made many significant contributions to.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Sep 17, 2020 • 53min
Connecting Reinforcement Learning to Simulation Software
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Max Pumperla, deep learning engineer at Pathmind and a contributor to many open source projects in data science and machine learning. Max is speaking on applications of reinforcement learning to simulation problems at the upcoming Ray Summit, a free virtual conference scheduled for Sep 30th and Oct 1st. Earlier this year I had Pathmind’s CEO Chris Nicholson on this podcast and he described how reinforcement learning might play a role in simulation problems. In this episode, Max provides an update and a technical description of how Pathmind uses reinforcement learning, RLLib, and Tune, to help users of AnyLogic, a widely used software for simulations in business applications.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.

Sep 10, 2020 • 43min
Using machine learning to detect shifts in government policy
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Weifeng Zhong, Senior Research Fellow at the Mercatus Center at George Mason University. He is the core maintainer of the open source Policy Change Index (PCI), a framework that uses machine learning and NLP to “process and read” large amounts of text to discern government priorities and policies. The initial PCI is focused on major policy shifts in China and uses NLP and machine learning to process and analyze the People’s Daily.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.