

Defining DataOps with Chris Bergh - Episode 26
Apr 8, 2018
54:31
Balancing Speed, Quality, and Innovation
- Chris Bergh's experience as COO of a healthcare analytics company taught him valuable lessons.
- He faced pressure to deliver analytics faster while ensuring high quality and fostering team innovation.
DataOps: Iteration and Innovation
- DataOps fosters iteration and innovation in data analytics.
- It addresses the challenge of balancing the need for speed with the demand for high-quality, reliable data.
The Problem of Data Issues and Burnout
- Chris Bergh describes the stressful experience of constantly fixing data issues, even on weekends.
- He emphasizes DataOps as a solution for a more sustainable and satisfying work life.
Get the Snipd Podcast app to discover more snips from this episode
Get the app 1 chevron_right 2 chevron_right 3 chevron_right 4 chevron_right 5 chevron_right 6 chevron_right 7 chevron_right 8 chevron_right 9 chevron_right 10 chevron_right 11 chevron_right 12 chevron_right 13 chevron_right 14 chevron_right 15 chevron_right 16 chevron_right 17 chevron_right 18 chevron_right 19 chevron_right
Introduction
00:00 • 5min
Data Ops, Data Engineering - It's Relevant to People
05:30 • 2min
Data Analytical Teams, Data Engineers, Data Scientists Need Focus on Errors
07:15 • 2min
Devops
08:48 • 2min
Data Engineering
10:22 • 2min
Data Engineering in Data Ops
12:30 • 4min
Data Engineering - Code Is Complexity, Communication Is Complexity
16:18 • 3min
Soffer's Data Management Techniques
18:58 • 5min
Testing in Data Analytics
23:38 • 5min
Data Warehouse Testing
29:07 • 2min
Are You Living in Fear of Making Changes?
30:41 • 3min
Are You Delivering on Time?
33:23 • 2min
Managing Continuous Integration in Data Engineering and Data Sciences
35:05 • 2min
Automated Regression of Your Data Analysis Suite
37:02 • 4min
How to Make the Most Out of Your Data
40:55 • 2min
Data Kitchen Platform - How to Scale a Data Analytics Workflow
43:10 • 3min
The Challenges of Data Management Platforms and Tooling
46:10 • 3min
Using Tableau to Deploy a Data Source
49:17 • 2min
Data Analytics
51:47 • 3min
Summary
Managing an analytics project can be difficult due to the number of systems involved and the need to ensure that new information can be delivered quickly and reliably. That challenge can be met by adopting practices and principles from lean manufacturing and agile software development, and the cross-functional collaboration, feedback loops, and focus on automation in the DevOps movement. In this episode Christopher Bergh discusses ways that you can start adding reliability and speed to your workflow to deliver results with confidence and consistency.
Preamble
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
- For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt.
- Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.
- Your host is Tobias Macey and today I’m interviewing Christopher Bergh about DataKitchen and the rise of DataOps
Interview
- Introduction
- How did you get involved in the area of data management?
- How do you define DataOps?
- How does it compare to the practices encouraged by the DevOps movement?
- How does it relate to or influence the role of a data engineer?
- How does a DataOps oriented workflow differ from other existing approaches for building data platforms?
- One of the aspects of DataOps that you call out is the practice of providing multiple environments to provide a platform for testing the various aspects of the analytics workflow in a non-production context. What are some of the techniques that are available for managing data in appropriate volumes across those deployments?
- The practice of testing logic as code is fairly well understood and has a large set of existing tools. What have you found to be some of the most effective methods for testing data as it flows through a system?
- One of the practices of DevOps is to create feedback loops that can be used to ensure that business needs are being met. What are the metrics that you track in your platform to define the value that is being created and how the various steps in the workflow are proceeding toward that goal?
- In order to keep feedback loops fast it is necessary for tests to run quickly. How do you balance the need for larger quantities of data to be used for verifying scalability/performance against optimizing for cost and speed in non-production environments?
- How does the DataKitchen platform simplify the process of operationalizing a data analytics workflow?
- As the need for rapid iteration and deployment of systems to capture, store, process, and analyze data becomes more prevalent how do you foresee that feeding back into the ways that the landscape of data tools are designed and developed?
Contact Info
- @ChrisBergh on Twitter
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- DataOps Manifesto
- DataKitchen
- 2017: The Year Of DataOps
- Air Traffic Control
- Chief Data Officer (CDO)
- Gartner
- W. Edwards Deming
- DevOps
- Total Quality Management (TQM)
- Informatica
- Talend
- Agile Development
- Cattle Not Pets
- IDE (Integrated Development Environment)
- Tableau
- Delphix
- Dremio
- Pachyderm
- Continuous Delivery by Jez Humble and Dave Farley
- SLAs (Service Level Agreements)
- XKCD Image Recognition Comic
- Airflow
- Luigi
- DataKitchen Documentation
- Continuous Integration
- Continous Delivery
- Docker
- Version Control
- Git
- Looker
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA