MLOps.community  cover image

MLOps.community

Latest episodes

undefined
Dec 8, 2021 • 1h 5min

Machine Learning at Reasonable Scale // Jacopo Tagliabue // MLOps Coffee Sessions #66

MLOps Coffee Sessions #66 with Jacopo Tagliabue, Machine Learning at Reasonable Scale. // Abstract We believe that immature data pipelines are preventing a large portion of industry practitioners from leveraging the latest research on ML: truth is, outside of Big Tech and advanced startups, ML systems are still far from producing the promised ROI. The good news is that times are changing: thanks to a growing ecosystem of tools and shared best practices, even small teams can be incredibly productive at a “reasonable scale”. Based on our experience as founders and researchers, we present our philosophy for modern, no-nonsense data pipelines, highlighting the advantages of a "PaaS-like" approach. // Bio Educated in several acronyms across the globe (UNISR, SFI, MIT), Jacopo Tagliabue was co-founder and CTO of Tooso, an A.I. company in San Francisco acquired by Coveo in 2019. Jacopo is currently the Director of AI at Coveo, shipping models to hundreds of customers and millions of users. When not busy building products, he is exploring topics at the intersection of language, reasoning, and learning: his research and industry work is often featured in the general press and premier A.I. venues. In previous lives, he managed to get a Ph.D., do sciency things for a pro basketball team, and simulate a pre-Columbian civilization. // Relevant Links Bigger boat repo: https://github.com/jacopotagliabue/you-dont-need-a-bigger-boat TDS series: https://towardsdatascience.com/tagged/mlops-without-much-ops (ep 3 and a NEW open-source contribution on data ingestion coming up) Open datasets for e-commerce and MLops experiments:  https://github.com/coveooss/SIGIR-ecom-data-challenge --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/ Connect with Jacopo on LinkedIn: https://www.linkedin.com/in/jacopotagliabue/
undefined
Nov 30, 2021 • 52min

The Future of Data Science Platforms is Accessibility // Skylar Payne // Coffee Session #65

MLOps Coffee Sessions #65 with Skylar Payne, The Future of Data Science Platforms is Accessibility. // Abstract The machine learning and data science space is blowing up -- new tools are popping up every day. While we seem to have every type of "Flow" and "Store" you could imagine, few people really understand how to glue this stuff together. Despite all the tools we have available, we still see companies failing to leverage data science effectively to drive business results. Instead of spending time driving business results, data scientists spend their time fiddling with Kubernetes, trying to debug that Spark serialization error figuring out how to map their code into the awkward "AI Pipeline" SDK. We have an industry filled with tools built by engineers... for engineers, rather than for data scientists. It's deeply disempowering. Meanwhile, data is still used effectively to drive decisions in many companies. Analysts have been solving very similar problems on the back of applications like Excel, Tableau, and Mode for literally decades. While there are still challenges in analytics, the MLOps space could learn something from analytics tools. Analytics tools better understand how to make their tools accessible. Analytics tools better understand the value of iterability. Analytics tools better understand that data problems are wicked problems:   - we have to iterate on the formulation and solution simultaneously - they involve many stakeholders with different opinions - there's no "right" answer - the problems are never 100% solved. If we're going to really drive the most business value from data science, we need to understand how to design our teams and tools to effectively work against such problems. The future of data science platforms is accessibility and iterability. // Bio Data is a superpower, and Skylar has been passionate about applying it to solve important problems across society. For several years, Skylar worked on large-scale, personalized search and recommendation at LinkedIn -- leading teams to make step-function improvements in our machine learning systems to help people find the best-fit role. Since then, he shifted my focus to applying machine learning to mental health care to ensure the best access and quality for all. To decompress from his workaholism, Skylar loves lifting weights, writing music, and hanging out at the beach! // Relevant Links --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/ Connect with Skylar on LinkedIn: https://www.linkedin.com/in/skylar-payne-766a1988/
undefined
Nov 29, 2021 • 56min

Impact of SWE in ML Projects // Laszlo Sragner and Tim Blazina // MLOps Reading Group

MLOps Reading Group meeting on November 20, 2021   --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Connect with us on LinkedIn: https://www.linkedin.com/company/mlopscommunity/ Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: https://mlops.community/
undefined
Nov 23, 2021 • 58min

The Future of AI and ML in Process Automation // Slater Victoroff // MLOps Coffee Sessions #64

MLOps Coffee Sessions #64 with Slater Victoroff, The Future of AI and ML in Process Automation. // Abstract The Unstructured Imperative Recent advances in AI have dramatically advanced the state of the art around unstructured data, especially in the spaces of NLP and computer vision. Despite this, the adoption of unstructured technologies has remained low. Why do you think that is? How have the dynamics changed in the last five years? Multimodal AI   Historic AI approaches have generally been constrained to one data modality (i.e. text or image). Recently, a wide range of papers in image captioning and document understanding have emphasized the need for more sophisticated "multimodal" techniques which can fuse information from multiple modalities. What is multimodal learning, and why is it so promising? Why are we seeing such an explosion of activity? What is Indico doing in this space? Machine Teaching As methods of supervision become more complex and multi-faceted, many researchers have begun investigating the inverse problem. That is how do we design supervision systems that more naturally follow human processes? What are some interesting trends in "the space", and where can we expect this field to go in the next few years? // Bio Slater Victoroff is the Founder and CTO of Indico, an enterprise AI solution for unstructured content that emphasizes document understanding.   Slater has been building machine learning solutions for startups, governments, and Fortune 100 companies for the past seven years and is a frequent speaker at AI conferences.   Indico’s framework requires 1000x less data than traditional machine learning techniques, and they regularly beat the likes of AWS, Google, Microsoft, and IBM in head-to-head bake-offs. // Relevant Links https://indico.io --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/ Connect with Slater on LinkedIn: https://www.linkedin.com/in/slatervictoroff
undefined
Nov 16, 2021 • 53min

PyTorch: Bridging AI Research and Production // Dmytro Dzhulgakov // Coffee Sessions #63

Dmytro Dzhulgakov, PyTorch: Bridging AI Research and Production.   Talking PyTorch is always interesting, as the Facebook ML OSS project is one of the most important parts of the machine learning tooling ecosystem. This week, we talked to Dmytro Dzhulgakov, a tech lead for PyTorch. We started off talking about Dmytro's journey to being an engineer and tech lead at Facebook, and what his role entails. Dmytro has been at Facebook for 10+ years, so he gave some very interesting advice on how to manage a career in software engineering for the machine learning world. After that, we got deep into the present and future of PyTorch and what improvements the project is making to support MLOps workflows. PyTorch is a large project, and Dmytro shared with us the valuable lessons he learned from confronting multifaceted scaling challenges while working on PyTorch. Finally, we talked about the future of machine learning engineering, especially as relates to how software engineers work by comparison. // Abstract Over the past few years, PyTorch became the tool of choice for many AI developers ranging from academia to industry. With the fast evolution of state-of-the-art in many AI domains, the key desired property of the software toolchain is to enable the swift transition of the latest research advances to practical applications. In this coffee session, Dmytro discusses some of the design principles that contributed to this popularity, how PyTorch navigates inherent tension between research and production requirements, and how AI developers can leverage PyTorch and PyTorch ecosystem projects for bringing AI models to their domain. // Bio Dmytro Dzhulgakov is a technical lead of PyTorch at Facebook where he focuses on the framework core development and building the toolchain for bringing AI from research to production. Previously he was one of the creators of ONNX, a joint initiative aimed at making AI development more interoperable. Before that Dmytro built several generations of large-scale machine learning infrastructure that powered products like Ads or News Feed. // Relevant Links https://pytorch.org/ https://pytorch.org/blog/ https://ai.facebook.com/blog/pytorch-builds-the-future-of-ai-and-machine-learning-at-facebook/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/ Connect with Dmytro on LinkedIn: https://www.linkedin.com/in/dzhulgakov/
undefined
Nov 9, 2021 • 56min

I Don't Like Jupyter Notebooks // Joel Grus // Coffee Sessions #62

MLOps Coffee Sessions #62 with Joel Grus, MLOps from Scratch. // Abstract In this talk, Joel Grus of “I don’t like notebooks” fame shares with us his 2021 perspective on notebooks, where he thinks MLOps is now, and what his hot takes in the data space are now. // Bio Joel Grus is a Principal Engineer at Capital Group, where he leads a team that builds search, data, and machine learning products for the investment group. He is the author of the bestselling O'Reilly book *Data Science from Scratch*, the not-bestselling self-published book *Ten Essays on Fizz Buzz*, and the controversial JupyterCon talk "I Don't Like Notebooks." He recently moved to Texas after living in Seattle for a very long time. // Relevant Links Data Science from Scratch book: https://www.oreilly.com/library/view/data-science-from/9781491901410/ Data Science from Scratch, 2nd Edition book: https://www.oreilly.com/library/view/data-science-from/9781492041122/ Ten Essays on Fizz Buzz: Meditations on Python, mathematics, science, engineering, and design book: https://www.amazon.com/Ten-Essays-Fizz-Buzz-Meditations/dp/0982481829 or https://leanpub.com/fizzbuzz/ I Don't Like Notebooks talk: https://www.youtube.com/watch?v=7jiPeIFXb6U I Don't Like Notebooks - #JupyterCon 2018 slides: https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/edit#slide=id.g362da58057_0_658 Fizz Buzz in Tensorflow: https://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/ Connect with Joel on LinkedIn: https://www.linkedin.com/in/joelgrus/ Timestamps: [00:00] Introduction to Joel Grus [01:32] Joel's background in tech [07:47] Joel's I Don't Like Notebooks talk on Jupyter Con [13:42] Better tooling around notebooks   [16:48] Hex [17:20] Step function evolution [20:41] Kinds of professionals required in Joel's organization to practice MLOps [23:08] Evaluation process [25:51] Sagemaker bring your own algorithm [27:30] Flexibility of models [31:55] Hot takes on data science world [34:19] Current Overall Maturity of MLOps [37:23] Kinds of problem in NLP and search [39:52] Finding ways to put structures [40:50] Probabilistic nature of machine learning systems [43:10] Data scientists coping up on writing production code [46:33] Invaluability of code review [47:22] Common repo structure [47:57] Reviewing codes [49:15] Code pals [50:36] Readability and function [52:23] Leverage code review [53:10] Remote work
undefined
Nov 2, 2021 • 41min

ML Tests // Svet Penkov // Coffee Sessions #61

MLOps Coffee Sessions #60 with Svet Penkov, ML Tests. // Abstract How confident do you feel when you deploy a new model? Does improving an ML model feel like a game of "whack-a-mole"? ML is taking over all sorts of industries and yet ML testing tools are virtually non-existent. Drawing parallels from software engineering and electronic circuit board design to the aviation and semiconductor industries, the need for principled quality assurance (QA) step in the MLOps pipeline is long overdue. Let's talk about why ML testing is hard, what can we do about it and what place should ML QA take in the future? // Bio Svet has been building robots ever since he was a kid. At some point, Svet got interested in not just how to build them, but actually how to make them think, and so he did a Ph.D. in AI & Robotics at the University of Edinburgh, UK. Towards the end of Svet's Ph.D., he joined FiveAI as a Research Scientist and led the motion prediction team for 3 years. Throughout his career, Svet spent endless hours fixing model regressions and fighting with edge cases and so at some point he had enough of it and decided it's time to do something about it. That's how Svet started Efemarai where they are building a platform for testing and improving ML continuously. // Relevant Links --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Svet on LinkedIn: https://www.linkedin.com/in/svpenkov/ Timestamps: [00:00] Introduction to Svet Penkov [02:10] Svet's background in tech [04:34] Testing on robotics vs areas of machine learning [05:21] What's missing in testing right now? [08:56] Who should test?            Step 1. Figuring out the requirements [12:04] Edge cases            Steps 2. Access of variation [13:29] Step 3. Validation and Verification [16:15] New challenges that need to be addressed [18:25] Test-driven development viability argument   [20:26] Software engineering tests vs machine learning engineering tests [23:23] Rule of tools in MLOps [26:15] Figuring out the difficulty in designing the API's [27:48] Svet's vision for the future [29:15] Moving goal post [31:00] 10 data points being realistic [31:27] Getting less [32:20] Efemarai: Where it came from and Why? [33:53] Efemarai - Functional Magnetic Resonance Imaging   [35:21] A perfect world journey [36:22] Value of tests [37:55] Get ready for the MLOps Community Slack testing channel!
undefined
Oct 25, 2021 • 52min

Linkedin Job Recommendations // Alexandre Patry // Coffee Sessions #60

Coffee Sessions #60 with Alexandre Patry, Path to Productivity in Job Search and Job Recommendation AI at LinkedIn. // Abstract A year ago, LinkedIn job search and recommendation AI teams were at the end of a growth cycle. They were fighting many fires at once: a high number of user complaints, engineers spending a significant amount of their time keeping our machine learning pipelines running, online infrastructure that wasn't supporting their growth, and challenges ramping new models to experiment. In this talk, Alex discusses how they all came together to manage these challenges and set themselves for their next phase of growth. // Bio Alex has been a machine learning engineer at LinkedIn for almost seven years. He had tour of duties in LinkedIn Groups, content search, and discovery, feed, and has been tech leading in LinkedIn Talent Solutions and Careers for the last two years. Prior to working at LinkedIn, Alex lived in Montreal where he completed a Ph.D. in Statistical Machine Translation, then work for five years on information extraction. // Relevant Links --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Skylar on LinkedIn: https://www.linkedin.com/in/skylar-payne-766a1988/ Connect with Alexandre on LinkedIn: https://www.linkedin.com/in/patry/
undefined
Oct 11, 2021 • 1h 11min

Data Selection for Data-Centric AI: Data Quality Over Quantity // Cody Coleman // Coffee Sessions #59

Coffee Sessions #59 with Cody Coleman, Data Quality Over Quantity or Data Selection for Data-Centric AI. // Abstract Big data has been critical to many of the successes in ML, but it brings its own problems. Working with massive datasets is cumbersome and expensive, especially with unstructured data like images, videos, and speech. Careful data selection can mitigate the pains of big data by focusing computational and labeling resources on the most valuable examples.   Cody Coleman, a recent Ph.D. from Stanford University and founding member of MLCommons, joins us to describe how a more data-centric approach that focuses on data quality rather than quantity can lower the AI/ML barrier. Instead of managing clusters of machines and setting up cumbersome labeling pipelines, you can spend more time tackling real problems. // Bio Cody Coleman recently finished his Ph.D. in CS at Stanford University, where he was advised by Professors Matei Zaharia and Peter Bailis. His research spans from performance benchmarking of hardware and software systems (i.e., DAWNBench and MLPerf) to computationally efficient methods for active learning and core-set selection. His work has been supported by the NSF GRFP, the Stanford DAWN Project, and the Open Phil AI Fellowship. // Relevant Links [preprint] Similarity Search for Efficient Active Learning and Search of Rare Concepts: [https://arxiv.org/abs/2007.00077](https://arxiv.org/abs/2007.00077) [video] Similarity Search for Efficient Active Learning and Search of Rare Concepts: [https://www.youtube.com/watch?v=vRVyOEK2JUU](https://www.youtube.com/watch?v=vRVyOEK2JUU) [blog post] Selection via Proxy: Efficient Data Selection for Deep Learning: [https://dawn.cs.stanford.edu/2020/04/23/selection-via-proxy/](https://dawn.cs.stanford.edu/2020/04/23/selection-via-proxy/) [slides] The DAWN of MLPerf: [https://drive.google.com/file/d/17ZpX0GOtOXG8QMn6KEc_Le8tUfDBlgDE/view](https://drive.google.com/file/d/17ZpX0GOtOXG8QMn6KEc_Le8tUfDBlgDE/view) [blog post] About Cody's research: [https://hai.stanford.edu/news/cody-coleman-lowering-machine-learnings-barriers-help-people-tackle-real-problems](https://hai.stanford.edu/news/cody-coleman-lowering-machine-learnings-barriers-help-people-tackle-real-problems) [video] About Cody: [https://www.youtube.com/watch?v=stxJMsxxxtA](https://www.youtube.com/watch?v=stxJMsxxxtA) --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/ Connect with Cody on LinkedIn: https://www.linkedin.com/in/codyaustun/ Timestamps: [00:00] Introduction to Cody Coleman [03:10] Cody's life story [07:35] Cody's journey in tech [15:04] Interest in Machine Learning and work at Stanford came about [21:48] Data-centric Machine Learning Data Quality [28:56] Research and Industry being together [33:33] Advice to practitioners [38:03] Principles of Machine Learning in an academic setting [43:50] Data-centric promising techniques that stand out [53:51] Developing benchmarks [56:34] Guardrails for machine learning vs automated testing suites   [1:02:57] Creating something valuable and useful [1:07:05] Data collecting vs Data Hoarding
undefined
Oct 7, 2021 • 56min

10 Types of Features your Location ML Model is Missing // Anne Cocos // Coffee Sessions #58

Coffee Sessions #58 with Anne Cocos, 10 Types of Features your Location ML Model is Missing. // Abstract Machine learning on geographic data is relatively under-studied in comparison to ML on other formats like images or graphs. But geographic data is prevalent across a wide variety of domains (although many practitioners may not think of it that way). Clearly, any dataset with `latitude` and `longitude` columns can be viewed as geographic data, but also any dataset with a `zipcode`, `city`, `address`, or `county` can be construed as geographic. Demographics, weather, foot traffic, points of interest, and topographic features can all be used to enrich a dataset with any of these types of keys. Incorporating relatively straightforward geographic features into models can yield substantial improvements; adding "distance to the beach" or "square mileage reachable within 10 min drive" to a real estate pricing model, for example, can lead to significant decreases in model error. Unfortunately, many ML teams find it difficult to incorporate these types of geographic data into their models because the process of ingesting from geographic formats (geojson or shapefiles), projecting, and properly joining with their existing data can be a large infrastructure lift. In this coffee session, Anne discusses ways to simplify the process of incorporating geographic or location data into the MLOps workflow, as well as interesting trends in the geographic ML research community that will ultimately make it easier for us to learn from geography just as we do with images or graphs today. // Bio Dr. Anne Cocos currently leads data science and machine learning at Ask Iggy, Inc., a venture-backed, seed round startup focused on location analytics. Her team builds tools that make it simple for data scientists to leverage location information in their models and analyses. Previously she was the Director and Head, NLP and Knowledge Graph at GlaxoSmithKline, where she built algorithms and infrastructure to enable GSK’s scientists to leverage all the world’s written biomedical knowledge for drug discovery. She also worked on applied natural language processing research at The Children’s Hospital of Philadelphia Department of Biomedical Informatics. Anne completed her Ph.D. in computer science at the University of Pennsylvania, where she was supported by the Google Ph.D. Fellowship and the Allen Institute for Artificial Intelligence Key Scientific Challenges award. Before shifting her career toward artificial intelligence, Anne spent several years as an end-user of early ML-powered technologies in the U.S. Navy and at HelloWallet. Her previous degrees are from the U.S. Naval Academy, Royal Holloway University of London, and Oxford University. She currently lives just outside Philadelphia with her husband and three boys. --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Anne on LinkedIn: https://www.linkedin.com/in/annecocos/

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode