MLOps.community  cover image

MLOps.community

Latest episodes

undefined
Nov 26, 2020 • 23min

When Machine Learning meets privacy - Episode 4

**Are Privacy Enhancing Technologies a myth** Data Privacy and machine learning are here to stay, and there’s no doubt they’re the hot trends to be following. But do they need to clash with each other? Can we have these titans to co-exist? It seems like finally 2020 and 2021 will be the years where Privacy Enhancing Technologies. But after all what are they? How are these techs being used and leveraged by organizations? Useful links: https://medium.com/@francis_49362/differential-privacy-not-a-complete-disaster-i-guess-d0345a76a5af Facebook and DIfferential Privacy Opacus Synthetic data generation
undefined
Nov 24, 2020 • 1h 1min

Introducing Data Downtime: From Firefighting to Winning // Barr Moses // MLOps Coffee Sessions #19

Coffee Sessions #19 with Barr Moses of Monte Carlo, Introducing Data Downtime: How to Prevent Broken Data Pipelines with Observability co-hosted by Vishnu Rachakonda //Bio Barr Moses is CEO & Co-Founder of Monte Carlo, a data observability company backed by Accel and other top Silicon Valley investors. Previously, she was VP Customer Operations at customer success company Gainsight, where she helped scale the company 10x in revenue and among other functions, built the data/analytics team. Prior to that, she was a management consultant at Bain & Company and a research assistant at the Statistics Department at Stanford. She also served in the Israeli Air Force as a commander of an intelligence data analyst unit. Barr graduated from Stanford with a B.Sc. in Mathematical and Computational Science. //Talk Takeaways As companies become increasingly data-driven, the technologies underlying these rich insights have grown more and more nuanced and complex. While our ability to collect, store, aggregate, and visualize this data has largely kept up with the needs of modern data teams (think: domain-oriented data meshes, cloud warehouses, data visualization tools, and data modelling solutions), the mechanics behind data quality and integrity has lagged. To keep pace with data’s clock speed of innovation, data engineers need to invest not only in the latest modelling and analytics tools but also technologies that can increase data accuracy and prevent broken pipelines. The solution? Data observability, the next frontier of data engineering and a pillar of the emerging Data Reliability category and the fix for eliminating data downtime. During this talk, listeners will learn about: The rise (and threat) of data downtime The relationship between DevOps Observability and Data Observability Data Observability and it's five key pillars How the best data teams are leveraging Data Observability to prevent broken pipelines //About Monte Carlo As businesses increasingly rely on data to drive better decision making, it’s mission-critical that this data is accurate and reliable. Billed by Forbes as the New Relic for data teams and backed by Accel and GGV, Monte Carlo solves the costly problem of broken data through their fully automated, end-to-end data reliability platform. Data teams spend north of 30% of their time tackling data quality issues, distracting data engineers, data scientists, and data analysts from working on revenue-generating projects. Providing full coverage of your data stack – all the way from data lake and warehouse to analytics dashboard – Monte Carlo’s platform empowers companies such as Eventbrite, Compass, Vimeo, and other enterprises to trust their data, saving time and money and unlocking the potential of data. //Other links you can check Barr on Learn more about Monte Carlo: https://www.montecarlodata.com What is data downtime? https://www.montecarlodata.com/the-rise-of-data-downtime/   What is data observability? https://www.montecarlodata.com/data-observability-the-next-frontier-of-data-engineering/ How data observability prevents broken data pipelines: https://www.montecarlodata.com/data-observability-how-to-prevent-your-data-pipelines-from-breaking/
undefined
Nov 23, 2020 • 59min

The Current MLOps Landscape // Nathan Benaich & Timothy Chen // MLOps Meetup #43

MLOps community meetup #43! Last Wednesday, we talked to Nathan Benaich, General Partner at Air Street Capital and Timothy Chen, Managing Partner at Essence VC about The MLOps Landscape. // Abstract: In this session, we explored the MLOps landscape through the eyes of two accomplished investors. Tim And Nathan shared with us their experience in looking at hundreds of ML and MLOps companies each year to highlight major insights they have gained. What do the ML infrastructure and tooling landscape look like at the moment? Where have they been seeing patterns emerge? What do they expect to see happen within the market in the next couple of years? What current tools out there are the most interesting to them? And last but not least how do they go about selecting which companies to invest in. // Bio: Nathan Benaich is the Founder and General Partner of Air Street Capital, a venture capital firm investing in early-stage AI-first technology and life science companies. The team’s investments include Mapillary (Acq. Facebook), Graphcore, Thought Machine, Tractable, and LabGenius. Nathan is Managing Trustee of The RAAIS Foundation, a non-profit with a mission to advance education and open-source research in common good AI. This includes running the annual RAAIS summit and funding fellowships at OpenMined. Nathan is also co-author of the annual State of AI Report. He holds a PhD in cancer biology from the University of Cambridge and a BA from Williams College. Timothy Chen is the Managing Partner at Essence VC, with a decade of experience leading engineering in enterprise infra and open source communities/companies. Prior to Essence, Tim was the SVP of Engineering at Cosmos, a popular open-source blockchain SDK. Prior to Cosmos, Tim cofounded Hyperpilot with Stanford Professor Christos Kozyrakis which later exited to Cloudera. Prior to Hyperpilot, Tim was an early employee at Mesosphere and CloudFoundry. Tim is also active in the open-source space as an Apache member. ----------- Connect With Us ✌️-------------    Join our Slack community:  https://go.mlops.community/slack Follow us on Twitter:  @mlopscommunity Sign up for the next meetup:  https://go.mlops.community/register   Connect with Demetrios on LinkedIn:  https://www.linkedin.com/in/dpbrinkm/ Connect with Nathan on LinkedIn:  https://www.linkedin.com/in/nathanbenaich/ Connect with Tim on LinkedIn:  https://www.linkedin.com/in/timchen Timestamps: 0:00 - Nathan Benaich & Timothy Chen 1:36 - Tim's background 4:07 - Nathan's background 8:08 - To Nathan: What's your take on the lay of the land in the MLOps fear or space? 10:20 - To Tim: Can you give us your rundown on what you've been seeing? The greater landscape that you look at. 14:35 - To Tim: What companies right now really excite you? What are some that are doing something that has a future? 19:36 - To Nathan: What kind of companies you're looking at right now that you're doing interesting things?   22:37 - The MLOps tools mature as the companies mature. 23:45 - There's no tool that looks exactly the same from MLOps prospective 25:44 - Sometimes MLOps tools is not a choice by data scientists at all. 28:10 - What MLOps needs that are not being addressed by the market right now? 35:00 - What is the annotation stack? 37:28 - How do you think about in the context of federated learning? 41.24 - Will MLOps tools eventually become idiomatic? Would that be desirable? 47:55 - How do you switch from this open-source model to the money-making model? 52:30 - Should we focus only on the open-source only at first and think about monetization later? If so, are investors prepared to invest in no revenue companies?
undefined
Nov 19, 2020 • 52min

When Machine Learning meets privacy - Episode 3 with Charles Radclyffe

**AI and ethical dilemmas** Artificial Intelligence is seen by many as a vehicle for great transformation, but for others, it still remains a mystery, and many questions remain unanswered: will AI systems rule us one day? Can we trust AI to rule our criminal systems? Maybe create political campaigns and dominate political advertisements? Or maybe something less harmful, do our laundry? Some of these questions may sound absurd, but they are for sure making people shift from thinking purely about functional AI capabilities but also to look further to the ethics behind creating such powerful solutions.   For this episode we count with Charles Radclyffe as a guest, the data philosopher, to cover some of these dilemmas. You can reach out to Charles through LinkedIn or at ethicsgrade.io   Useful links:   - MLOps.Community slack - TEDx talk - Surviving the Robot Revolution - Digital Ethics whitepaper
undefined
Nov 16, 2020 • 59min

UN Global Platform // Mark Craddock // Co-Founder & CTO, Global Certification and Training Ltd // MLOps Meetup #42

MLOps community meetup #42! Last Wednesday, we talked to Mark Craddock, Co-Founder & CTO, Global Certification and Training Ltd (GCATI), about UN Global Platform. // Abstract: Building a global big data platform for the UN. Streaming 600,000,000+ records / day into the platform. The strategy developed using Wardley Maps and the Platform Design Toolkit. // Bio: Mark contributed to the Cloud First policy for the UK Public sector and was one of the founding architects for the UK Governments G-Cloud programme. Mark developed the initial CloudStore which enabled the UK Public Sector to procure cloud services from over 2,500 suppliers. The UK Public Sector has now purchased over £6.3Bn of cloud services, with £3.6Bn from Small to Medium Enterprises in the UK. Mark lead the development of the United Nations Global Platform. A multi-cloud platform for capacity building within the national statistics offices in the use of Big Data and its integration with administrative sources, geospatial information, traditional survey and census data. Mark is now building a non-profit training and certification organization. ----------- Connect With Us ✌️-------------    Join our Slack community:  https://go.mlops.community/slack Follow us on Twitter:  @mlopscommunity Sign up for the next meetup:  https://go.mlops.community/register   Connect with Demetrios on LinkedIn:  https://www.linkedin.com/in/dpbrinkm/ Connect with Mark on LinkedIn:  https://www.linkedin.com/in/markcraddock/ Timestamps: [0:00] - Intro to Mark Craddock [03:35] - Mark's background [05:05] - UN Global Platform [05:18] - Vision: A global collaboration to harness the power of data for better lives [05:37] - UN GWG (Big) Data Membership [05:49] - Sustainable Development Goals [06:21] - Using the platform [06:30] - Approach [06:44] - Principles [07:29] - How big was the team who put this together? [08:09] - Leave no one behind. Endeavour to reach the furthest behind first. [08:24] - Platform Business Model [10:06] - Six distinct aspects of a platform and its ecosystem [10:46] - The platform is the only business model able to orchestrate the wide range of products and services in an ecosystem [11:09] - Through the means of a platform organization, ecosystems are capable of providing an improbable combination of attributes [11:55] - Platforms and business models are also one of the best organizational structures for enabling rapid evolution [13:22] - Technology Strategy [13:23] - Wardley Maps [14:50] - Is this were Machine Learning tools would fit in? [20:35] - Are you looking how fast these are moving across to the right? How can you gauge that? [26:57] - Is the value fluid? [28:43] - How did you factor in the different personas? [30:34] - How do you enable loosely coupled teams? [35:44] - Data also moves from left to right [42:00] - Technology Strategy Handbook [42:20] - Achievements - July '19 [42:31] - Global Billing Intelligence [43:15] - Privacy-Preserving Techniques Handbook   [43:26] - Cryptographic Techniques [44:12] - Global Big Datasets [44:55] - Big Data [47:41] - Automatic Identification System (AIS) [48:14] - Automatic Dependent Surveillance (ADS-B) [48:41] - Satellite Imagery   [49:11] - Services in the platform [49:16] - Location Analytics Service [50:06] - Stack Sample [50:37] - Data Sources [51:50] - NiFi Dataflow [52:20] - Is this how you enabled reproducibility? [53:47] - Location Analytics Service [55:31] - Shanghai - Flights [55:45] - Shanghai -  Cargo Ships [56:00] - UN Global Platform
undefined
Nov 12, 2020 • 36min

When Machine Learning meets Data Privacy - Episode 2 with Cat Coode

What are regulations saying about data privacy? We are already aware of the importance of using Machine Learning to improve businesses, nevertheless to feed Machine Learning, data is a must, and in many cases, this data might even be considered sensitive information. So, does this mean that with new privacy regulations, access to data will be more and more difficult? ML and Data Science have their days counted? Or Will Machine beat privacy? To answer all these questions I’ve invited Cat Coode, an expert on Data Privacy regulations, to join me in this episode, and help us sort out these questions! Don’t forget to subscribe to the Mlops.community slack and if you’re looking for privacy-preserving solutions, show us some love and give a star to the Synthetic data open-source repo (https://github.com/ydataai/ydata-synthetic) Useful links: For more on Cat's work, you can have a look at catcoode.com or connect through LinkedIn. Original Privacy by design definition: https://www.ipc.on.ca/wp-content/uploads/resources/7foundationalprinciples.pdf
undefined
Nov 10, 2020 • 1h 1min

When You Say Data Scientist Do You Mean Data Engineer? Lessons Learned From Start Up Life // Elizabeth Chabot

In this episode, we talked to Elizabeth Chabot, Consultant at Deloitte, about When You Say Data Scientist Do You Mean Data Engineer? Lessons Learned From StartUp Life.    // Key takeaways: If you have a data product that you want to function in production, you need MLOps Education needs to happen about the data product life cycle, noting that ML is just part of the equation Titles need to be defined to help outside users understand the differences in roles   // Abstract: ML and AI may sound sexy to investors, but if you work in the field you've probably spent late nights reviewing outputs manually, poured over logs and ran root cause analyses until your eyes hurt. If you've created data products at a company where analytics and data science held no meaning before your arrival, you've probably spent many-a-late-night explaining the basics of data collection, why ETL cannot be half-baked and that when you create a supervised model it needs to be supervised. Companies hoping to create a data product can have a data scientist show them how ML/AI can further their product, help them scale, or create better recommendations than their competitors. What companies are not always aware of is once the algorithm is created the data scientist is usually handicapped until more data-hires are made to build the necessary pipelines and frontend to put the algorithm in production. With the number of unique data-titles growing each year, how should the first data-evangelist-wrangler-wizard navigate title assignment?   // Bio: Elizabeth is a researcher turned data nerd. With a background in social and clinical sciences, Elizabeth is focused on developing data solutions that focus on creating value adds while allowing the user to make more intelligent decisions. ----------- Connect With Us ✌️------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
undefined
Nov 10, 2020 • 1h

Metaflow: Supercharging Our Data Scientist Productivity // Ravi Kiran Chirravuri // MLOps Meetup #41

MLOps community meetup #41! Last Wednesday was an exciting episode that some attendees couldn't help to ask when is the next season of their favorite series! The conversation was around Metaflow: Supercharging Data Scientist Productivity with none other than Netflix’s very own Ravi Kiran Chirravuri. // Abstract: Netflix's unique culture affords its data scientists an extraordinary amount of freedom. They are expected to build, deploy, and operate large machine learning workflows autonomously without the need to be significantly experienced with systems or data engineering. Metaflow, our ML framework (now open-source at metaflow.org), provides them with delightful abstractions to manage their project's lifecycle end-to-end, leveraging the strengths of the cloud: elastic compute and high-throughput storage. In this talk, we preface with our experience working alongside data scientists, present our human-centric design principles when building Machine Learning Infrastructure, and showcase how you can adopt these yourself with ease with open-source Metaflow. // Bio: Ravi is an individual contributor to the Machine Learning Infrastructure (MLI) team at Netflix. With almost a decade of industry experience, he has been building large-scale systems focusing on performance, simplified user journeys, and intuitive APIs in MLI and previously Search Indexing and Tensorflow at Google. ----------- Connect With Us ✌️-------------    Join our Slack community:   https://go.mlops.community/slack Follow us on Twitter:  @mlopscommunity Sign up for the next meetup:  https://go.mlops.community/register Connect with Demetrios on LinkedIn:  https://www.linkedin.com/in/dpbrinkm/ Connect with Ravi on LinkedIn:  https://www.linkedin.com/in/seeravikiran/ Timestamps: [00:00] - Introduction to Ravi Kiran Chirravuri [02:21] - Ravi's background [05:19] - Metaflow: Supercharging Data Scientist Productivity   [05:31] - Why do we have to build Metaflow? [06:14] - Infographic of a  very simplified view of a machine learning workflow [07:01] - "An idea is typically meaningless without execution." [07:38] - Scheduling   [08:14] - Life is great!   [08:24] - Life happens and things are crashing and burning! [09:04] - What is Metaflow? [12:01] - How much data scientist cares [12:25] - How infrastructure is needed [13:03] - What Metaflow does [13:44] - How can you go about using Metaflow for your data science needs? [14:20] - People love DAG's   [16:00] - Baseline [16:16] - Architecture [17:28] - Syntax [19:00] - Vertical Scalability [21:10] - Horizontal Scalability [22:59] - Failures are a feature [23:57] - State Transfer and Persistence [27:05] - Dependencies [30:57] - Model Ops: Versioning [33:19] - Monitoring in Notebooks [35:16] - Decouple Orchestration [36:48] - AWS Step Functions [37:16] - Export to AWS Step Functions [38:10] - From Prototype to Production and Back [42:07] - What are the prerequisites to use Metaflow? [43:32] - Where does Metaflow store everything? [45:10] - Are there any tutorials available? [45:22] - Have the tutorials been updated?    [47:27] - How do you deploy Metaflow? [49:02] - Do you see Metaflow becoming a tool to develop and support auto ML. [50:34] - What were some of the biggest learnings that you saw people doing that they're not doing on Netflix? [52:19] - Does Metaflow exist to help data scientists to orchestrate everything? [54:30] - What do you version?
undefined
Nov 9, 2020 • 47min

Luigi in Production // MLOps Coffee Sessions #18 // Luigi Patruno ML in Production

Coffee Sessions #18  with Luigi Patruno of ML in Production, a Centralized Repository of Best Practices Summary Luigi Patruno and ML in production MLOps workflow: Knowledge sharing and best practices Objective: learn! Links: ML in production: https://mlinproduction.com/ Why you start MLinProduction: https://mlinproduction.com/why-i-started-mlinproduction/ Luigi Patruno: a man whose goal is to help data scientists, ML engineers, and AI product managers, build and operate machine learning systems in production. Luigi shares with us why he started ML in Production - A lot irrelevant content; a lot of clickbait with low standards of quality. He had an Entrepreneurial itch and The solution was to start a weekly newsletter. From there he started creating Blog posts and now teamed up with Sam Charrington of TWIML to create courses on SagMaker ML.  Applied ML Best practices Reading google and microsoft papers Analyzing the tools that are out there ie sagemaker and how to the see the world? Aimed at making you more effective and efficient at your job Community questions Taking some time to answer some community questions! Who do you learn from? Favorite resources? Self-taught, papers, talks Construct the systems Uber michelangelo ----------------- 📝 Rought notes 📝 ---------------- Any companies that stand out to you in terms of MLOps excellence? Google, Amazon, Stichfix: they've had to solve hard problems Serving ads Personalization at scale Vertical problems: within their vertices Motivated by real challenges DropBox Great articles A great machine learning company Tools Sagemaker Has a course on sagemaker Nice lessons baked into the system Dos and don’t of MLOps DO LOG! Monitor Automate - manual analysis leads to problems Do it manually first til you feel confident that you can automate it Tag, version Store your training, val, and test sets! What is his process of identifying use cases that are suitable for machine learning as a solution? How do they proceed methodically? Start with business goal Potential number of users that the solution can benefit The ability to build a predictive model Performance x impact = score Rank problems by this How developed are the datasets? What part of the ML in Production process do people underestimate the most? What are the low hanging fruits that many people don’t take advantage of? Generate actual value without needing to build the most complex model possible In industry, performance is only one part of the equation How has he seen ML in production evolve over the last few years and where does he think it's headed next? More and more tools! Industry-specific tool taking advantage of ML Problem is you must have industry knowledge  --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/
undefined
Nov 5, 2020 • 19min

When Machine Learning meets Data Privacy

This is the first episode of a podcast series on Machine Learning and Data privacy.  Machine Learning is the key to the new revolution in many industries. Nevertheless, ML does not exist without data and a lot of it, which in many cases results in the use of sensitive information. With new privacy regulations, access to data is today harder and much more difficult but, does that mean that ML and Data Science has its days counted? Will the Machines beat privacy?   Don’t forget to subscribe to the mlops.community slack (https://go.mlops.community/slack) and to give a star to the Synthetic data open-source repo (https://github.com/ydataai/ydata-synt...) Useful links:   Medium post with the podcast transcription - https://medium.com/@fabiana_clemente/... In case you’re curious about GDPR fines - enforcementtracker.com   The Netflix Prize - https://www.nytimes.com/2010/03/13/technology/13netflix.html Tensorflow privacy - https://github.com/tensorflow/privacy

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode