
Machine Learning Archives - Software Engineering Daily
Machine learning and data science episodes of Software Engineering Daily.
Latest episodes

Aug 17, 2016 • 40min
Data Validation with Dan Morris
Data Validation is the process of ensuring that data is accurate. In many software domains, an application is pulling in large quantities of data from external sources. That data will eventually be exposed to users, and it needs to be correct.
Radius Intelligence is a company that aggregates data on small businesses. In order to ensure that business addresses and phone numbers are correct, Radius uses human data validation to ensure that their machine-gathered data is correct. On today’s episode, Srini Kadamati interviews Dan Morris about human data validation, and how it fits into a machine learning pipeline.
The post Data Validation with Dan Morris appeared first on Software Engineering Daily.

Aug 16, 2016 • 43min
Machine Learning for Sales with Per Harald Borgen
Machine learning has become simplified. Similar to how Ruby on Rails made web development approachable, scikit-learn takes away much of the frustrating aspects of machine learning, and lets the developer focus on building functionality with high-level APIs.
Per Harald Borgen is a developer at Xeneta. He started programming fairly recently, but has already built a machine learning application that cuts down on the time his sales team has to spend qualifying leads. What I found most interesting about this episode was that machine learning gets used by a single developer to solve a simple business problem and deliver solid value. This is in contrast to how many of us think about machine learning–as an intimidating domain that requires a large team to build anything meaningful.
The post Machine Learning for Sales with Per Harald Borgen appeared first on Software Engineering Daily.

Jun 8, 2016 • 53min
Phone Spam with Truecaller CTO Umut Alp
The war against spam has been going on for decades. Email spam blockers and ad blockers help protect us from unwanted messages in our communication and browsing experience. These spam prevention tools are powered by machine learning, which catches most of the emails and ads that we don’t want to see. TrueCaller is a company that is bringing this quality of spam detection to our phone call systems.
Umut Alp is the CTO of TrueCaller, and he joins the show today to break down the engineering problems of preventing telephone call spam. Users of TrueCaller install it on their phones, and the software allows users to report when they have received a spam call. Using this reporting mechanism, and other learning algorithms, TrueCaller is able to learn what types of calls it should block from being accepted by your phone. Today on Software Engineering Daily, we discuss cell phone spam prevention.
The post Phone Spam with Truecaller CTO Umut Alp appeared first on Software Engineering Daily.

Mar 8, 2016 • 57min
Machine Learning in Healthcare with David Kale
“Building a model to predict disease and deploying that in the wild – the bar for success is much higher there than, say, deciding what ad to show you.”
Diagnosing illness today requires the trained eye of a doctor. With machine learning, we might someday be able to diagnose illness using only a data set. Today on Software Engineering Daily, we are joined by David Kale, a researcher at the intersection of machine learning and clinical data. We discuss the machine learning and research techniques he is using to diagnose illnesses using neural networks, and we also talk about the challenges of performing data science in hospitals, where the data is mostly confidential. David will also be presenting at Strata + Hadoop World in San Jose. We’re partnering with O’Reilly to support this conference – if you want to go to Strata, you can save 20% off a ticket with our code PCSED.
Questions
What kind of work does a data scientist at a children’s hospital do?
Where is machine learning actually improving healthcare?
What types of data are present in the intensive care unit?
Can you give me an example of how you used an LSTM to make a prediction?
What were the results of your recurrent neural network experiments?
Do you think that deep learning is overhyped right now?
Links
Learning to Diagnose with LSTM Recurrent Neural Networks
Strata+Hadoop World
Lasagne
Theano
Torch
Deep Learning for Java
Recurrent neural network
David’s research page
The post Machine Learning in Healthcare with David Kale appeared first on Software Engineering Daily.

Feb 29, 2016 • 55min
Data Science at Monsanto with Tim Williamson
“Nothing’s cool unless you call it ‘as a service.’ ”
Monsanto is a company that is known for its chemical and biological engineering. It is less well known for its data science and software engineering teams. Tim Williamson is a data scientist at Monsanto, and on today’s show he talked about how he and a small group of engineers at Monsanto dramatically shifted the culture around data science-driven genetic engineering.
In this episode, Tim explains how useful graph databases are for modeling the genetic lineages, and talks about how Monsanto manages simulations and experiments on their genomics software pipeline. Tim also talks about how just a few engineers can create a cultural shift within a large company like Monsanto using the leverage allowed by software.
Questions
Why is data science important to Monsanto?
How will data science be used in the future to improve food production?
What are a genomics pipeline and a breeding cycle?
Can you use simulations to improve genetic predictions?
Why are graph databases useful for Monsanto?
What is ancestry-as-a-service?
Are there any agri-tech companies or products that are really exciting to you?
Is it realistic or desirable to move to a meat-free nutrition model?
Links
Graphs Are Feeding The World
YHat Show on SEDaily
Monsanto
Genetic Imputation
In vitro meat
Tim on Twitter
The post Data Science at Monsanto with Tim Williamson appeared first on Software Engineering Daily.

Jan 29, 2016 • 52min
Deep Learning and Keras with François Chollet
“I definitely think we can try to abstract away the first principles of intelligence and then try to go from these principles to an intelligent machine that might look nothing like the brain.”
Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. In this episode, François discusses the state of deep learning, and explains why the field is experiencing a cambrian explosion that eventually may taper off. He explains the need for Keras and why its simplicity and ease makes it a useful deep learning library for developers to experiment and build with.
François Chollet is the author of Keras and the founder of Wysp, learning platform for artists. He currently works for Google as a deep learning engineer and researcher.
Questions
Do you try to design intelligent machines using the human brain as a blueprint?
How has the structure of software engineering teams changed to accommodate the addition of machine learning?
What are the best practices for deploying machine learning systems developed in production by data scientists?
Why do neural network developers need to be able to perform fast experimentation?
Why is modularity important to a deep learning library?
How does Keras interface with the GPU?
What are the interesting trends you notice in machine learning?
Links
Keras
Theano
Tensor Flow
Directed Acylical Graph
Lasagne
RDD
François on Twitter
The post Deep Learning and Keras with François Chollet appeared first on Software Engineering Daily.

Jan 19, 2016 • 56min
Machine Learning for Businesses with Joshua Bloom
“You’ve got software engineers who are interested in machine learning, and think what they need to do is just bring in another module and then that will solve their problem. It’s particularly important for those people to understand that this is a different type of beast.”
Machine learning is something that many business are starting to tack onto their existing processes. Yet, to add machine learning capabilities after the fact is often a fool’s errand. Joshua argues that machine learning cannot be an afterthought, but rather must be custom developed to suit the specific problem or question that each company is trying to answer. His company, Wise.io, tackles this challenge of helping business build ground up machine learning applications that generate accurate predictions for use in an array of business processes.
Joshua Bloom is the cofounder and CTO of Wise.io. He is also an astrophysicist, and a professor of astronomy at UC Berkeley.
Questions
What is a machine learning system?
What is the broader impact of this improved ease of use of machine learning algorithms?
How do you think data scientists are stratified?
What does Wiseio do?
How do you abstract away machine learning implementations at large organizations with enterprise software systems?
What is it about machine learning systems that give rise to weak contracts between abstraction levels?
Links
Josh Bloom: Keynote – A Systems View of Machine Learning
Machine Learning: The High Interest Credit Card of Technical Debt
Machine Learning and Technical Debt with D. Sculley
Wise.io
Joshua on Wikipedia
Joshua on Twitter
The post Machine Learning for Businesses with Joshua Bloom appeared first on Software Engineering Daily.

Dec 15, 2015 • 40min
TensorFlow with Greg Corrado
“You don’t mind if failures slow things down, but its very important that failures do not stop forward progress.”
TensorFlow is an open source machine learning library intended to bring large-scale, distributed machine learning and deep learning to everyone. Google recently released the framework to the public as a second-generation API, having learned from the successes and failures of DistBelief.
Greg Corrado is a senior research scientist and tech lead at Google, where he focuses on the research areas of machine intelligence, machine perception and natural language processing.
Questions
From the end-user’s point of view, how does Smart Reply work?
How can teams blend research and engineering to make better products?
How did the DistBelief project shape Tensor Flow?
How does Tensor Flow differ from streaming frameworks that are more generalized like Spark or Storm?
Why would I want to do machine learning on my phone?
How is Tensor Flow fault tolerant?
What are things the open source community should dive into in Tensor Flow, to fix and improve it?
Links
TensorFlow
Computer, respond to this email.
Bridging Data Science and Engineering with Greg Lamp
Greg’s Research Page
Sponsors
Hired.com is the job marketplace for software engineers. Go to hired.com/softwareengineeringdaily to get a $600 bonus upon landing a job through Hired.
Digital Ocean is the simplest cloud hosting provider. Use promo code SEDAILY for $10 in free credit.
The post TensorFlow with Greg Corrado appeared first on Software Engineering Daily.

Dec 11, 2015 • 56min
Data Science at Spotify with Boxun Zhang
“I normally try to sit together or very close to a product team or engineering team. And by doing so, I get very close to the source of all kinds of challenging problems.”
Spotify is a streaming music service that uses data science and machine learning to implement product features such as recommendation systems and music categorization, but also to answer internal questions.
Boxun Zhang is a data scientist at Spotify where he focuses on understanding user behavior within the product.
Questions
What is the overlap between distributed systems and data science?
How has Spotify’s big data architecture evolved over time?
As a data scientist do you need to understand this big data architecture well?
What were the benefits for starting to use Kafka?
What kinds of data science problems do you tackle at Spotify?
Could you describe what a random forest is?
Why are there so many streaming systems, and what do you use at Spotify?
How will data science change moving towards the future?
Links
The Evolution of Big Data at Spotify
Luigi
Project Jupyter
XGBoost
Automatic Statistician
Skytree
Sponsors
Hired.com is the job marketplace for software engineers. Go to hired.com/softwareengineeringdaily to get a $600 bonus upon landing a job through Hired.
Digital Ocean is the simplest cloud hosting provider. Use promo code SEDAILY for $10 in free credit.
The post Data Science at Spotify with Boxun Zhang appeared first on Software Engineering Daily.

Dec 8, 2015 • 56min
Learning Machines with Richard Golden
“When I was a graduate student, I was sitting in the office of my advisor in electrical engineering and he said, ‘Look out that window – you see a Volkswagon, I see a realization of a random variable.’ ”
Richard Golden is the host of Learning Machines 101, a podcast that covers artificial intelligence and machine learning topics. Dr. Golden is also a full-time Professor of Cognitive Science and Electrical Engineering at UT Dallas.
Questions
What is machine learning?
What are the fundamental concepts to build artificial intelligence?
How do you define a rule in the domain of machine learning?
How can a machine learning system estimate the probability of something it has not seen?
Could you explain how ML could be applied to real world healthcare scenarios?
What is a neural network?
What is the difference between natural and artificial intelligence?
Links
Bayesian Model Averaging
Frequentist Model Averaging
McCulloch-Pitts Formal Neurons
Dr. Golden’s Professor Page
Sponsors
Hired.com is the job marketplace for software engineers. Go to hired.com/softwareengineeringdaily to get a $600 bonus upon landing a job through Hired.
Digital Ocean is the simplest cloud hosting provider. Use promo code SEDAILY for $10 in free credit.
The post Learning Machines with Richard Golden appeared first on Software Engineering Daily.