

The Data Exchange with Ben Lorica
Ben Lorica
A series of informal conversations with thought leaders, researchers, practitioners, and writers on a wide range of topics in technology, science, and of course big data, data science, artificial intelligence, and related applications. Anchored by Ben Lorica (@BigData), the Data Exchange also features a roundup of the most important stories from the worlds of data, machine learning and AI. Detailed show notes for each episode can be found on https://thedataexchange.media/ The Data Exchange podcast is a production of Gradient Flow [https://gradientflow.com/].
Episodes
Mentioned books

Apr 16, 2020 • 35min
Computational Models and Simulations of Epidemic Infectious Diseases
In this episode of the Data Exchange I speak with Bruno Gonçalves, a data scientist working at the intersection of Data Science and Finance. I have known Bruno for several years and we met when I recruited him to teach several extremely popular conference tutorials and talks on machine learning and deep learning. Prior to shifting over to data science, he spent several years as a researcher focused on mathematical models in Epidemiology – a field with a rich history dating as far back as the 1920s. This episode is devoted to tools and techniques for modeling epidemics.Our conversation covered:Bruno’s background and his experience in modeling epidemics.The field of epidemic models: what techniques are used, the size of the community of researchers, and how do models get evaluated.His two recent posts: “Epidemic Modeling 101 – Or why your CoVID-19 exponential fits are wrong” and “Epidemic Modeling 102 – All CoVID-19 models are wrong, but some are useful”The role that epidemic models play in decision making.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Data Exchange Newsletter.

Apr 9, 2020 • 44min
Human-in-the-loop machine learning
In this episode of the Data Exchange I speak with Rob Munro, CEO of Machine Learning Consulting and author of the forthcoming book, “Human-in-the-loop Machine Learning”. If you want a copy of Rob’s book, use the discount code podexchange20.Our conversation covered:Rob’s experience building data and machine learning products at Powerset, Idibon, and AWS.Natural language processing - Given Rob’s extensive experience as a researcher, practitioner, and entrepreneur in areas that touch on NLP, we discussed recent trends in language technologies.Human-in-the-loop machine learning.Our goal in this podcast is to build a community of people interested in Data, Machine Learning and AI. If you have suggestions for us on what to recommend (books, conferences, links), and guests to book, please visit TheDataExchange.media site and fill out the “contact” form. The first five people who fill out the form get a free book from Manning (you can view Manning’s catalog here).Detailed show notes can be found on The Data Exchange web site.

Apr 2, 2020 • 40min
Next-generation simulation software will incorporate deep reinforcement learning
In this episode of the Data Exchange I speak with Chris Nicholson, founder and CEO of Pathmind, a startup applying deep reinforcement learning (DRL) to simulation problems. In a recent post I highlighted two areas where companies can begin to add DRL to their suite of tools: personalization and recommendation engines, and simulation software. My interest in the interplay between DRL and simulation software began when I came across the work of Pathmind in this area.Our conversation focused on deep reinforcement learning and its applications:We began with the basics: what is reinforcement learning and why should businesses pay attention to it?We discussed enterprise applications of DRL, with particular emphasis in areas where Chris and Pathmind have been focused of late: Business Process Simulation and Optimization.Pathmind have been early adopters of Ray and of RLlib, a popular open-source library for reinforcement learning built on top of Ray. I asked Chris why they chose to build on top of RLlib.Detailed show notes can be found on The Data Exchange web site.

Mar 26, 2020 • 37min
Business at the speed of AI: Lessons from Shopify
In this episode of the Data Exchange I speak with Solmaz Shahalizadeh, VP and Head of Data Science and Data Platform Engineering at Shopify. Shopify is a powerhouse in ecommerce and their technology powers over a million businesses worldwide. Solmaz is a frequent speaker and presenter at conferences throughout the world and she has played a critical role in helping Shopify scale its data and machine learning infrastructure.Our conversation covered many important technical and business topics including:Building and scaling machine learning data products.Building and scaling data teams.Data informed product building.Detailed show notes can be found on The Data Exchange web site.

Mar 19, 2020 • 40min
How deep learning is being used in search and information retrieval
In this episode of the Data Exchange I speak with Edo Liberty, founder of Hypercube, a startup building tools for deploying deep learning models in search and information retrieval involving large collections. When I spoke at AI Week in Tel Aviv last November several friends encouraged me to learn more about Hypercube - I’m glad I took their advice!Our conversation covered several topics including:Edo’s experience applying machine learning and building tools for ML at places like Yale, Yahoo's Research Lab in New York, and Amazon’s AI Lab.How deep learning is being used in search and information retrieval.Challenges one faces in building search and information retrieval applications when the size of collections are large.Deep learning based search and information retrieval and what Edo describes as “enterprise end-to-end deep search platforms”.Detailed show notes can be found on The Data Exchange web site.

Mar 12, 2020 • 39min
The responsible development, deployment and operation of machine learning systems
In this episode of the Data Exchange I speak with Alejandro Saucedo, Engineering Director at Seldon, a startup building tools for productionizing machine learning. Alejandro is also Chief Scientist at The Institute for Ethical AI & Machine Learning, a UK-based research center that conducts “research into processes and frameworks that support the responsible development, deployment and operation of machine learning systems”.Our conversation covered Alejandro’s work at both Seldon and the Institute for Ethical AI & Machine Learning:We discussed topic areas that the Institute focuses on including explainability, MLOps, adversarial robustness, and privacy-preserving machine learningWe covered some of the recent output from the Institute including the machine learning maturity model, their open source explainable AI library, their AI-RFX Procurement Framework, and their list of Principles for Responsible AIWe also discussed his role at Seldon, and areas that Seldon has been focused on.Detailed show notes can be found on The Data Exchange web site.

Mar 5, 2020 • 35min
Hyperscaling natural language processing
In this episode of the Data Exchange I speak with Edmon Begoli, Chief Data Architect at Oak Ridge National Laboratory (ORNL). Edmon has developed and implemented large-scale data applications on systems like Open MPI, Hadoop/MapReduce, Apache Calcite, Apache Spark, and Akka. Most recently he has been building large-scale machine learning and natural language applications with Ray, a distributed execution framework that makes it easy to scale machine learning and Python applications.Our conversation included a range of topics, including:Edmon’s role at the ORNL and his experience building applications with Hadoop and Spark.What is distributed online learning?Why they started using Ray to build distributed online learning applications.Two important use cases: suicide prevention among US veterans and infectious disease surveillance.Detailed show notes can be found on The Data Exchange web site.Join Michael Jordan, Manuela Veloso, Azalia Mirhoseini, Zoubin Ghahramani, Wes McKinney, Ion Stoica, Gaël Varoquaux, and many other speakers at the first Ray Summit In San Francisco, May 27-28. Tickets start at $200.

Feb 27, 2020 • 36min
What businesses need to know about model explainability
In this episode of the Data Exchange I speak with Krishna Gade, founder and CEO at Fiddler Labs, a startup focused on helping companies build trustworthy and understandable AI solutions. Prior to founding Fiddler, Krishna led engineering teams at Pinterest and Facebook.Our conversation included a range of topics, including:Krishna’s background as an engineering manager at Facebook and Pinterest.Why Krishna decided to start a company focused on explainability.Guidelines for companies who want to begin working on incorporating model explainability into their data products.The relationship between model explainability (transparency) and security (ML that can resist adversarial attacks).Detailed show notes can be found on The Data Exchange web site.Join Michael Jordan, Manuela Veloso, Azalia Mirhoseini, Zoubin Ghahramani, Wes McKinney, Ion Stoica, Gaël Varoquaux, and many other speakers at the first Ray Summit In San Francisco, May 27-28. Tickets start at $200.

Feb 20, 2020 • 36min
Scalable Machine Learning, Scalable Python, For Everyone
In this episode of the Data Exchange I speak with Dean Wampler, Head of Developer Relations at Anyscale, the startup founded by the creators of Ray. Ray is a distributed execution framework that makes it easy to scale machine learning and Python applications. It has a very simple API and as someone who uses both Python and machine learning, Ray has been a wonderful addition to my toolbox. Dean has long been one of my favorite architects, speakers and teachers, and we have known each other since the early days of Apache Spark. He has authored numerous books and is known for his interest in Scala and programming languages, as well as in software architecture.Our conversation spanned many topics, including:What is Ray and why should someone consider using it?The first Ray Summit (May 27-28 in San Francisco)Dean’s first impressions of Ray, and his journey from Scala to Python.An update on Ray’s core libraries, Ray on Windows, and distributed training with Ray.Detailed show notes can be found on The Data Exchange web site.For more on Ray and scalable machine learning & Python, come hear from Dean Wampler, Michael Jordan, Ion Stoica, Manuela Veloso, Wes McKinney and many other leading developers and researchers at the first Ray Summit in San Francisco (May 27-28).

Feb 13, 2020 • 34min
Computational humanness, analogy and innovation, and soft concepts
In this episode of the Data Exchange I speak with Dafna Shahaf, Associate Professor at the School of Computer Science and Engineering, the Hebrew University of Jerusalem. She also runs the hyadata lab, a research group that consistently produces unique and interesting projects at the intersection of computer science, data, and the social sciences.Our conversation included a range of topics, including:Computational analogy: Dafna and her students mine online sources like patent filings, research papers, and data from crowdsourcing platforms focused on innovation, and in the process they produce tools that should be of interest to innovation officers and members of innovation labs.Soft Concepts: Dafna has continued her work on computational humor, and along with her students, they have new tools for automatically finding trivia facts in Wikipedia.An upcoming workshop on Innovative Ideas in Data Science (April 20th in Taipei; the deadline to submit proposals is: 21 February 2020).Detailed show notes can be found on The Data Exchange web site.