The Data Exchange with Ben Lorica

Ben Lorica
undefined
Feb 6, 2020 • 33min

Building domain specific natural language applications

In this episode of the Data Exchange I speak with David Talby, co-creator of Spark NLP, an open source, highly scalable, production grade natural language processing (NLP) library. Spark NLP has become one of the more popular NLP libraries and is available on PyPI, Conda, Maven, and Spark Packages. With recent advances in research in large-scale natural language models, there is strong interest in domain specific natural language applications. Besides their work on Spark NLP, David and his collaborators are building natural language models tuned specifically for healthcare applications.Our conversation spanned many topics, including:Spark NLP: its current status and some common and surprising use cases.Recent developments in NLP research and their implications for companies.Spark NLP for HealthcareDetailed show notes can be found on The Data Exchange web site.
undefined
Jan 30, 2020 • 42min

The state of privacy-preserving machine learning

In this episode of the Data Exchange I speak with Morten Dahl, research scientist at Dropout Labs, a startup building a platform and tools for privacy-preserving machine learning. He is also behind TF Encrypted, an open source framework for encrypted machine learning in TensorFlow.  The rise of privacy regulations like CCPA and GDPR combined with the growing importance of ML has led to a strong interest in tools and techniques for privacy-preserving machine learning among researchers and practitioners. Morten brings the unique perspective of being a longtime security researcher who has also worked as a data scientist in industry.Our conversation spanned many topics, including:Morten’s unique background as an experienced security researcher, developer, and data scientist.The current state of TF Encrypted.Federated learning (FL) and secure aggregation for FL.Privacy-preserving ML solutions will employ a variety of techniques, and thus we also discussed related topics such as differential privacy, homomorphic encryption, and RISELab’s stack for coopetitive learning (MC2).Detailed show notes can be found on The Data Exchange web site.
undefined
Jan 23, 2020 • 38min

Taking messaging and data ingestion systems to the next level

Sijie Guo on how Apache Pulsar is able to handle both queuing and streaming, and both online and offline applications.In this episode of the Data Exchange I speak with Sijie Guo, founder of StreamNative, a new startup focused on making enterprise messaging technologies - specifically Apache Pulsar - easy to use on the cloud. Sijie was previously a cofounder of Streamlio (acquired by Splunk) and prior to that he led the messaging team at Twitter. He is also the main organizer behind the Pulsar Summit (April in San Francisco), a new conference whose Call for Speakers closes on January 31st.  Our conversation spanned many topics, including:The role of messaging in modern data applications and platforms.The two main types of messaging applications: queuing and streaming.Apache Pulsar as a unified messaging platform, able to handle both queuing and streaming, and both online and offline applications.A status update on Apache Pulsar.Detailed show notes can be found on The Data Exchange web site.
undefined
Jan 16, 2020 • 41min

Business at the speed of AI: Lessons from Rakuten

The Data Exchange Podcast: Bahman Bahmani on attracting and retaining talent, and the importance of delivery-oriented teams.In this episode of the Data Exchange I speak with Bahman Bahmani, VP of Data Science and Engineering at Rakuten, a large Japanese ecommerce and online retail company. When I first met Bahman several years ago, he was finishing up his Computer Science PhD at Stanford, and at the time he was giving technical talks on machine learning algorithms and their applications to computer security. Today he leads a large team at Rakuten, and in my opinion he has established an organizational structure, processes and an AI practice that other companies should study.Our conversation spanned many topics, including:The impact that AI, machine learning, and data have had on Rakuten’s businesses.Attracting, nurturing, and retaining talent in an environment when data scientists, data engineers, and analysts who all have many other options.The trio of strategic options: operational excellence, product leadership, customer intimacy.Organization and culture, including key roles within an AI practice.The power of delivery-oriented teams with end-to-end responsibility.Detailed show notes can be found on The Data Exchange web site.
undefined
Jan 9, 2020 • 30min

The combination of the right software and commodity hardware will prove capable of handling most machine learning tasks

In this episode of the Data Exchange I speak with Nir Shavit, Professor of EECS at MIT, and cofounder and CEO of Neural Magic, a startup that is creating software to enable deep neural networks to run on commodity CPUs (at GPU speeds or faster). Their initial products are focused on model inference, but they are also working on similar software for model training.Our conversation spanned many topics, including:Neurobiology, in particular the combination of Nir’s research areas of multicore software and connectomics – a branch of neurobiology.Why he believes the combination of the right software and CPUs will prove capable of handling many deep learning tasks.Speed is not the only factor: the “unlimited memory” of CPUs are able to unlock larger problems and architectures.Neural Magic’s initial offering is in inference, model training using CPUs is also on the horizon.Detailed show notes can be found on The Data Exchange web site.
undefined
Dec 26, 2019 • 36min

Key AI and Data Trends for 2020

In this episode of the Data Exchange, I speak with my podcast co-organizer Mikio Braun, data scientist at GetYourGuide, and a former machine learning researcher and data architect. Mikio and I go out on a limb and speculate about new trends in AI and Data that we think people should pay attention to in 2020.Our conversation spanned many topics, and we listed trends in:Models: reinforcement learning, deep learning, language models, and related topics.Applications: including emerging use cases for reinforcement learning.Infrastructure and Tools: end-to-end machine learning platforms, the importance of distributed computing, etc.Managing risks: privacy, security, safety, fairness, etc.Emerging technologies to watch for in 2020.Detailed show notes can be found on The Data Exchange web site.
undefined
Dec 12, 2019 • 36min

The evolution of TensorFlow and of machine learning infrastructure

In this episode of the Data Exchange I speak with Rajat Monga, one of the founding members of the TensorFlow Engineering team. Up until recently Rajat was the engineering manager for TensorFlow at Google. Our conversation spanned many topics, including:TFX, a production scale machine learning platform based on TensorFlow.Distributed training.MLIR (Multi-Level Intermediate Representation), “a representation format and library of compiler utilities that sits between the model representation and low-level compilers/executors that generate hardware-specific code.”Deep learning in the enterprise.The state of machine learning infrastructure.[full show notes can be found on the Data Exchange web site.]
undefined
Nov 26, 2019 • 40min

Building large-scale, real-time computer vision applications

In this episode of the Data Exchange I speak with Reza Zadeh, founder and CEO of Matroid, a startup focused on making computer vision applications easy to build and deploy. Reza is also an adjunct professor at Stanford.This particular conversation spanned many topics pertaining to computer vision, including:Challenges in building large-scale, real-time computer vision applications.Robustness of computer vision applications (adversarial attacks, deepfakes).Impact of computer vision technologies on society: security, privacy and surveillanceWe also preview the upcoming 2020 edition of the ScaledML conference: Reza is the main organizer behind one of my favorite conferences in the SF Bay Area.[full show notes can be found on the Data Exchange site.]
undefined
Nov 12, 2019 • 45min

Taking stock of foundational tools for analytics and machine learning

In this episode of The Data Exchange, I speak with Paco Nathan, author, teacher, and founder of Derwen.ai, a boutique consulting firm specializing in Data, ML, and AI. Paco consults with companies and speaks before audiences all over the world, and I plan to have him as a frequent guest on this podcast to draw on his observations of diverse organizations.This particular conversation spanned many topics, including:Data Governance: Paco’s talk on the topicAutoML: Paco’s talk on the topicPyTorch and TensorFlow: posts we discussed - [1], [2]Reproducibility and feature selectionThe Streamlit open source project for ML app development, and Grus Law (“if you can think up something crazy and/or dangerous to do with notebooks, people are doing it.”)I want this to be more than just a podcast. I want to create a community to help people make better decisions. A key part of this is getting you involved. I have ideas on how this community will grow, but as a first step, I want to ask a question related to one of the topics that Paco and I discussed: PyTorch and TensorFlow. I'd love to have you weigh in by filling out the survey form. I'll report on results and key insights in a future episode of this podcast.[full show notes can be found on the Data Exchange site.]

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app