MLOps.community

Demetrios
undefined
Dec 18, 2020 • 56min

Deep in the heart of data // Carl Steinbach // MLOps Coffee Sessions #22

Coffee Sessions #22 with Carl Steinbach of LinkedIn, Deep in the Heart of Data. //Bio Carl is a Senior Staff Software Engineer and currently the Tech Lead for LinkedIn's Grid Development Team. He is a contributor to Emerging Architectures for Modern Data Infrastructure //Other links referenced by Carl: https://rise.cs.berkeley.edu/wp-content/uploads/2017/03/CIDR17.pdf https://www.youtube.com/watch?v=-xIai_FvcSk&ab_channel=WePayEngineering https://softwareengineeringdaily.com/2019/10/23/linkedin-data-platform-with-carl-steinbach/ https://www.slideshare.net/linkedin/carl-steinbach-open-source https://dreamsongs.com/RiseOfWorseIsBetter.html https://engineering.linkedin.com/blog/2017/03/a-checkup-with-dr--elephant--one-year-later https://engineering.linkedin.com/ https://engineering.linkedin.com/blog/2018/11/using-translatable-portable-UDFs https://a16z.com/2020/10/15/the-emerging-architectures-for-modern-data-infrastructure/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/ Connect with Carl on LinkedIn: https://www.linkedin.com/in/carlsteinbach/ Timestamps: [00:00] Introduction to Carl Steinbach [00:44] Carl's background [04:51] Breakdown of Transpiler [10:55] Advantages of Decoupling the Execution Layer [15:25] Differences between UDF (user-defined function) Functions and Views [18:45] How do you ensure the reproducibility of these Views? [23:58] Data structure evolution [27:55] Are Data Lakes and Data Warehouse fundamentally different things or are they on a path towards conversion? [33:37] It's inevitable that people will start doing machine learning on databases [36:01] Who gets permission on what, especially when it comes to data and how sensitive things can be? [41:27] Security aspect of data   [43:40] Does it require a level of obstruction on top of the data of the file system? [45:48] Why do we go back and go forward which sets this trend?
undefined
Dec 17, 2020 • 36min

When machine learning meets privacy - Episode 7

ML and Encryption - It's all about secure insights #7! In this episode, we've invited Théo Ryffel, Founder of Arkhn and founding member of the Open-Mined community.  // Abstract: In this episode,  Théo introduces us to the concept of encrypted Machine Learning, when and the best practices to have it applied in the development of Machine Learning based solutions, and the challenges of building a community.  //Other links to check on Théo: https://twitter.com/theoryffel https://arkhn.com https://openmined.org https://arxiv.org/pdf/1811.04017.pdf https://arxiv.org/pdf/1905.10214.pdf //Final thoughts Feel free to drop some questions into our slack channel (https://go.mlops.community/slack)  Watch some of the other podcast episodes and old meetups on the channel: https://www.youtube.com/channel/UCG6qpjVnBTTT8wLGBygANOQ ----------- Connect With Us ✌️-------------     Join our Slack community:  https://go.mlops.community/slack Follow us on Twitter:  @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Fabiana on LinkedIn: https://www.linkedin.com/in/fabiana-clemente/ Connect with Théo on LinkedIn: https://www.linkedin.com/in/theo-ryffel
undefined
Dec 14, 2020 • 36min

When Machine Learning meets privacy - Episode 6

**Privacy-preserving ML with Differential Privacy** Differential privacy is without a question one of the most innovative concepts that came around in the last decades, with a variety of different applications even when it comes to Machine Learning. Many are organizations already leveraging this technology to access and make sense of their most sensitive data, but what is it? How does it work? And how can we leverage it the most? To explain this and provide us a brief intro on Differential Privacy, I've invited Christos Dimitrakakis. Professor at University, counts already with multiple publications (more than 1000!!!) in the areas of Machine Learning, Reinforcement Learning, and Privacy. Useful links: Christos Dimitrakakis list of publications Differential privacy for Bayesian inference through posterior sampling Authors: Christos Dimitrakakis, Blaine Nelson, Zuhe Zhang, Aikaterini Mitrokotsa, Benjamin IP Rubinstein Differential privacy use cases Open-source differential privacy projects Open-source project for Differential Privacy in SQL databases
undefined
Dec 14, 2020 • 56min

Human-centric ML Infrastructure: A Netflix Original // Savin Goyal // MLOps Meetup #44

MLOps community meetup #44! Last Wednesday, we talked to Savin Goyal, Tech lead for the ML Infra team at Netflix. // Abstract: In this conversation, Savin talked about some of the challenges encountered and choices made by the Netflix ML Infrastructure team while developing tooling for data scientists. // Bio: Savin is an engineer on the ML Infrastructure team at Netflix. He focuses on building generalizable infrastructure to accelerate the impact of data science at Netflix. // Other links to check on Savin: https://www.usenix.org/conference/opml20/presentation/cepoi https://www.youtube.com/watch?v=lakPlz8GJcA&ab_channel=RConsortium https://www.youtube.com/watch?v=-oMZAS9qfrE&ab_channel=AnalyticsIndiaMagazine https://www.youtube.com/watch?v=yyWirT279tY&ab_channel=FunctionalTV https://www.youtube.com/watch?v=QkRJ24Q0E-k&ab_channel=Matroid ----------- Connect With Us ✌️-------------    Join our Slack community:  https://go.mlops.community/slack Follow us on Twitter:  @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Savin on LinkedIn: https://www.linkedin.com/in/savingoyal/ Timestamps: [00:00] Background of Savin Goyal [02:41] Breakdown of Metaflow [05:44] In the stack, where does Metaflow stand? [13:23] Where does Metaflow start in Runway Project? [15:27] What tools or storage does Netflix use for DataOps, ie: the front-end management of data sets and how does that integrate with Metaflow? [18:56] Recommender Systems: Can you explain the other areas that you're using Machine Learning? [22:27] What do you feel is the hardest part of building an operating  Machine Learning workflow? [28:45] 3 Pillars: Reproducibility, Scalability, Usability. [36:05] You give so much power to people. How do you keep them from going overboard? [37:47] Can you explain this Pillar of Usability? [41:09] Road-based access control has been coming up a lot recently. Does Metaflow do something specific for that? [44:49] What are some learnings that come across that you didn't have since you open-sourced when you were working at Netflix? [48:10] What kind of trends you have been seeing? Where do you feel like the market is going? [50:33] Have you seen some companies really interested in Metaflow? How have you been seeing them combine other tools that are out there?
undefined
Dec 8, 2020 • 47min

A Conversation with Seattle Data Guy // Benjamin Rogojan // MLOps Coffee Sessions #21

Coffee Sessions #21 with Benjamin Rogojan of Seattle Data Guy, A Conversation with Seattle Data Guy //Bio Ben has spent his career focused on all forms of data. He has focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. He has also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. Ben privately consults on data science and engineering problems both solo as well as with a company called Acheron Analytics. He has experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.   //Other links you can check Ben on https://www.theseattledataguy.com/mlops-vs-aiops-what-is-the-difference/#page-content https://medium.com/@benrogojan https://www.kdnuggets.com/2020/01/data-science-interview-study-guide.html --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/ Connect with Ben on LinkedIn: https://www.linkedin.com/in/benjaminrogojan/ Timestamps [00:00] Intro to Benjamin Rogojan   [01:22] Ben's background [03:30] What are some of your learnings/key things that jumped out of you? [08:15] Agile and Data Science [10:28] Likelihood of failure [13:05] Sometimes you have to wait [15:11] Defining your data science process [19:55] Layer of communication is important between the data scientists and higher-ups [21:29] How do you navigate challenges? Are there any tools or processes you quantify to work with your clients? [24:30] How do you show the value of your work using monitoring and observability [27:58] How can we be better communicators?   [31:15] Have you seen other roles that really helped the jell of the team? [33:50] What are your interests? What are you passionate about at the moment? [34:29] Is there something new you're learning at the moment? [37:55] Do you have a process about how you figure out even data science or ML is right for a company? [39:33] Do you have a blog about the process you follow? [41:24] What is one negative wisdom that you want to share with the community? [44:35] How did you come up with the company name Seattle Data Guy? Links mentioned in this episode:   https://medium.com/@benrogojan https://www.cprime.com/resources/blog/agile-methodologies-how-they-fit-into-data-science-processes/ https://www.coriers.com/the-data-science-interview-study-guide/ https://medium.com/@SeattleDataGuy/from-data-scientist-to-data-leader-workshop-c6be69698af https://towardsdatascience.com/4-must-have-skills-every-data-scientist-should-learn-8ab3f23bc325
undefined
Dec 7, 2020 • 1h 4min

Monzo Bank - An MLOps Case Study // Neal Lathia // MLOps Coffee Sessions #20

Coffee Sessions #20 with Neal Lathia of Monzo Bank, talking about Monzo Bank - An MLOps Case Study //Bio Neal is currently the Machine Learning Lead at Monzo in London, where his team focuses on building machine learning systems that optimise the app and help the company scale. Neal's work has always focused on applications that use machine learning - this has taken him from recommender systems to urban computing and travel information systems, digital health monitoring, smartphone sensors, and banking. //Talk Takeaways Monzo Bank has a small, but a very impactful team continuously learning new things. Optimistically do their utmost to avoid “throwing problems over the wall,” and so they build systems, iterate on machine learning models, and collaborate very closely with each other and with many folks across the business. Hopefully, all of that paints a picture of a team that aims to bring real and valuable machine learning systems to life. Monzo does not spend time trying to advance the state-of-the-art in machine learning or tweak models to absolute perfection. //Other links you can check Neal on Personal Website: http://nlathia.github.io/ Research: http://nlathia.github.io/research/ Press & Speaking: http://nlathia.github.io/public/ http://nlathia.github.io/2020/06/Customer-service-machine-learning.html http://nlathia.github.io/2020/10/ML-and-rule-engines.html http://nlathia.github.io/2020/10/Monzo-ML.html http://nlathia.github.io/2019/09/Large-NLP-in-prod.html http://nlathia.github.io/2020/07/Shadow-mode-deployments.html  https://github.com/operatorai --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/ Connect with Neal on LinkedIn: https://www.linkedin.com/in/nlathia/ Timestamps: [00:00] Intro to Neal Lathia   [02:48] Background of Monzo Bank [05:06] Problems you're solving with Machine Learning at Monzo?   [08:36] Why do you think it's fairly easy to frame a lot of problems using Machine Learning?   [11:56] How do you decide on rule-based or Machine learning?   [15:33] Team Structure   [19:18] What are some challenges like size, latency and the like? [21:52] How have you addressed learning skills/challenges in your team?   [26:17] Do you have something that connects your team with all the metadata you have? [27:14] Are you also having the monitoring models in your dashboard or is that something else? [28:51] Why should I bring another tool that the company is not familiar with when we already have one?   [31:43] Do you feel like there will be a point in time where you need to buy a tool because one problem is taking so much of your time? [38:30] Engineering optimization teams for machine learning?   [40:34] Take us through the idea to production? [46:29] How do you deal with reproducibility? [49:48] Do you have ethics people on the team? [54:12] Why are you using GCP and AWS? [56:09] What are these different used cases and how do they differ? [57:57] How do you address applications that don't work?
undefined
Dec 3, 2020 • 33min

When Machine Learning meets privacy - Episode 5

**The intersection between DataOps and privacy** DataOps is considered by many as the new era of data management, a set of principles that emphasizes communication, collaboration, integration, and automation of cooperation between the different teams in an organization that have to deal with data: data engineers, data scientists to data analysts.  But is there any relation between DataOps and data privacy protection? Can organizations leverage DataOps to ensure that their data is privacy compliant? For this episode we've invited Lars Albertsson founder of Scling and former Data Engineer at Spotify, Lars has been educating organizations on how to get value from data and engineering efficiency! You can easily find him and reach out on Twitter and LinkedIn. Don't forget to join the MLOps.Community if you are not yet a member. Useful links: What is DataOps - https://www.ibm.com/blogs/journey-to-ai/2019/12/what-is-dataops/ Data engineering reading list - https://www.scling.com/reading-list/ Data engineering courses - https://www.scling.com/courses/
undefined
Nov 26, 2020 • 23min

When Machine Learning meets privacy - Episode 4

**Are Privacy Enhancing Technologies a myth** Data Privacy and machine learning are here to stay, and there’s no doubt they’re the hot trends to be following. But do they need to clash with each other? Can we have these titans to co-exist? It seems like finally 2020 and 2021 will be the years where Privacy Enhancing Technologies. But after all what are they? How are these techs being used and leveraged by organizations? Useful links: https://medium.com/@francis_49362/differential-privacy-not-a-complete-disaster-i-guess-d0345a76a5af Facebook and DIfferential Privacy Opacus Synthetic data generation
undefined
Nov 24, 2020 • 1h 1min

Introducing Data Downtime: From Firefighting to Winning // Barr Moses // MLOps Coffee Sessions #19

Coffee Sessions #19 with Barr Moses of Monte Carlo, Introducing Data Downtime: How to Prevent Broken Data Pipelines with Observability co-hosted by Vishnu Rachakonda //Bio Barr Moses is CEO & Co-Founder of Monte Carlo, a data observability company backed by Accel and other top Silicon Valley investors. Previously, she was VP Customer Operations at customer success company Gainsight, where she helped scale the company 10x in revenue and among other functions, built the data/analytics team. Prior to that, she was a management consultant at Bain & Company and a research assistant at the Statistics Department at Stanford. She also served in the Israeli Air Force as a commander of an intelligence data analyst unit. Barr graduated from Stanford with a B.Sc. in Mathematical and Computational Science. //Talk Takeaways As companies become increasingly data-driven, the technologies underlying these rich insights have grown more and more nuanced and complex. While our ability to collect, store, aggregate, and visualize this data has largely kept up with the needs of modern data teams (think: domain-oriented data meshes, cloud warehouses, data visualization tools, and data modelling solutions), the mechanics behind data quality and integrity has lagged. To keep pace with data’s clock speed of innovation, data engineers need to invest not only in the latest modelling and analytics tools but also technologies that can increase data accuracy and prevent broken pipelines. The solution? Data observability, the next frontier of data engineering and a pillar of the emerging Data Reliability category and the fix for eliminating data downtime. During this talk, listeners will learn about: The rise (and threat) of data downtime The relationship between DevOps Observability and Data Observability Data Observability and it's five key pillars How the best data teams are leveraging Data Observability to prevent broken pipelines //About Monte Carlo As businesses increasingly rely on data to drive better decision making, it’s mission-critical that this data is accurate and reliable. Billed by Forbes as the New Relic for data teams and backed by Accel and GGV, Monte Carlo solves the costly problem of broken data through their fully automated, end-to-end data reliability platform. Data teams spend north of 30% of their time tackling data quality issues, distracting data engineers, data scientists, and data analysts from working on revenue-generating projects. Providing full coverage of your data stack – all the way from data lake and warehouse to analytics dashboard – Monte Carlo’s platform empowers companies such as Eventbrite, Compass, Vimeo, and other enterprises to trust their data, saving time and money and unlocking the potential of data. //Other links you can check Barr on Learn more about Monte Carlo: https://www.montecarlodata.com What is data downtime? https://www.montecarlodata.com/the-rise-of-data-downtime/   What is data observability? https://www.montecarlodata.com/data-observability-the-next-frontier-of-data-engineering/ How data observability prevents broken data pipelines: https://www.montecarlodata.com/data-observability-how-to-prevent-your-data-pipelines-from-breaking/
undefined
Nov 23, 2020 • 59min

The Current MLOps Landscape // Nathan Benaich & Timothy Chen // MLOps Meetup #43

MLOps community meetup #43! Last Wednesday, we talked to Nathan Benaich, General Partner at Air Street Capital and Timothy Chen, Managing Partner at Essence VC about The MLOps Landscape. // Abstract: In this session, we explored the MLOps landscape through the eyes of two accomplished investors. Tim And Nathan shared with us their experience in looking at hundreds of ML and MLOps companies each year to highlight major insights they have gained. What do the ML infrastructure and tooling landscape look like at the moment? Where have they been seeing patterns emerge? What do they expect to see happen within the market in the next couple of years? What current tools out there are the most interesting to them? And last but not least how do they go about selecting which companies to invest in. // Bio: Nathan Benaich is the Founder and General Partner of Air Street Capital, a venture capital firm investing in early-stage AI-first technology and life science companies. The team’s investments include Mapillary (Acq. Facebook), Graphcore, Thought Machine, Tractable, and LabGenius. Nathan is Managing Trustee of The RAAIS Foundation, a non-profit with a mission to advance education and open-source research in common good AI. This includes running the annual RAAIS summit and funding fellowships at OpenMined. Nathan is also co-author of the annual State of AI Report. He holds a PhD in cancer biology from the University of Cambridge and a BA from Williams College. Timothy Chen is the Managing Partner at Essence VC, with a decade of experience leading engineering in enterprise infra and open source communities/companies. Prior to Essence, Tim was the SVP of Engineering at Cosmos, a popular open-source blockchain SDK. Prior to Cosmos, Tim cofounded Hyperpilot with Stanford Professor Christos Kozyrakis which later exited to Cloudera. Prior to Hyperpilot, Tim was an early employee at Mesosphere and CloudFoundry. Tim is also active in the open-source space as an Apache member. ----------- Connect With Us ✌️-------------    Join our Slack community:  https://go.mlops.community/slack Follow us on Twitter:  @mlopscommunity Sign up for the next meetup:  https://go.mlops.community/register   Connect with Demetrios on LinkedIn:  https://www.linkedin.com/in/dpbrinkm/ Connect with Nathan on LinkedIn:  https://www.linkedin.com/in/nathanbenaich/ Connect with Tim on LinkedIn:  https://www.linkedin.com/in/timchen Timestamps: 0:00 - Nathan Benaich & Timothy Chen 1:36 - Tim's background 4:07 - Nathan's background 8:08 - To Nathan: What's your take on the lay of the land in the MLOps fear or space? 10:20 - To Tim: Can you give us your rundown on what you've been seeing? The greater landscape that you look at. 14:35 - To Tim: What companies right now really excite you? What are some that are doing something that has a future? 19:36 - To Nathan: What kind of companies you're looking at right now that you're doing interesting things?   22:37 - The MLOps tools mature as the companies mature. 23:45 - There's no tool that looks exactly the same from MLOps prospective 25:44 - Sometimes MLOps tools is not a choice by data scientists at all. 28:10 - What MLOps needs that are not being addressed by the market right now? 35:00 - What is the annotation stack? 37:28 - How do you think about in the context of federated learning? 41.24 - Will MLOps tools eventually become idiomatic? Would that be desirable? 47:55 - How do you switch from this open-source model to the money-making model? 52:30 - Should we focus only on the open-source only at first and think about monetization later? If so, are investors prepared to invest in no revenue companies?

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app