Speaker 2
And you mentioned extreme programming, the notion of an automated build, widely adopted practice, you know, widely adopted practice and, and doing different levels of automated testing. And I mean, there's certain practices like that have been widely adopted regardless if they're like to meet that definition of unanimous or ubiquitously applicable your best practice they're widely adopted with your focus on data ops and i'm thinking you know the corollary of okay how do we make a change what is the concept of a of a build or how do these practices apply in data ops? For listeners who aren't familiar with that, are there some practices that maybe you wouldn't label them best practice, but they have gotten widely adopted and you reach for them quite often? Yeah,
Speaker 1
so they're not as widely adopted as I would hope. And I think it's due to a lack of tooling. The data community has really sort of struggled. Like, you know, if you look at like the normal DevOps community, those people, they're programmers. There's a lot of programmers. Yeah. And they've got the skills to build their own tools. Right. So what happens with developers is when you don't have a tool, like you want to do something and the tool doesn't exist to do it, or you have issues with the existing tools, what do you do? You build your own tools, right? The data folks have never really had that level of skill up until recently with AI. So now they're doing some interesting stuff with large language models. But that's limited as well, right? Because you still need some harder core stuff to do but anyway so the the tooling um has not been the greatest in the data in the data data community for a long time up and it's getting a lot better now but up until recently you know it hasn't so having said that a technique like database refactoring like to do code refactoring it's built into every ide now it's it's a no-brainer right but back in the day, mid to late 90s, it was pretty rough. Search and replace. Yeah, you'd do a global search and replace, and you'd shoot yourself in the foot, and you'd have to go. It was a mess. Now you just want to rename an operation, rename an operation. Done. Because all the tools have been built. But I wanted to rename a column in a database, and this is my, you know, I've been running on this example for 20 years now. It's embarrassing. But it's embarrassing because I can still get away with it. If I want to, if I want to, if I go into your organization and I ask you, I want you to rename a column in your customer table, in your production database, and I want you to roll that change into production by the end of the day. Yeah.
Speaker 2
Not going to happen for most people.
Speaker 1
Yeah. Well, if you're a really small organization, sure. Right. Yeah. So I'll get like half answers like that, but Bank of America, like, you know, a real organization. Right. So, and the answer is no. And they'd be deathly afraid of that. So imagine, right. These are professionals that are deathly afraid of making an absolutely trivial change. Yep. That is trivial. I can't find anything. Like, it's the most trivial example I can come up with.
Speaker 2
I bet you they'd be afraid of extending the length of the customer last name column. Yeah, yeah. Because
Speaker 1
who knows what that's going to break, right? And so for all the nonsense around data quality techniques over the last 40 or 50 years, if you're afraid of doing something absolutely trivial, you have not achieved, you know, you really aren't even in play, right? Yeah. So the answer is in database, so don't do that, is the quick answer. But the way you actually implement it like i could i could now assuming i had all the all the authority and all this stuff to go into bank america i could rename a column um in a production database no problem at all because i know how to do it yeah and the way that you do it is you you know you deprecate the old schema you put the new scheme in place you put something in place that i could trigger you know brute force it if you have to um to keep things up to date and i can roll that change by the end of the day right because it really is just and this and even if i'm just typing like let alone if i've got some tools to do it right so you know i'm just typing i could i could build and test it's a pattern it's a pattern yeah it's a and it's a very well documented pattern it's really easy to do uh if you've got the skills and if you've got the courage to do something like that. But it's like normal refactoring. It presumes you've got a regression test suite in place to identify what you broke. And then if you've got an automated regression test suite, that presumes you have some sort of CI strategy in place. If you've got some sort of CI strategy in place, you should probably have some sort of continuous delivery or continuous deployment.
Speaker 2
And it presumes we understand the architecture and there are 10 other little applications running around with the connection string to that database. Oh,
Speaker 1
no. No. Well, yeah. So database refactoring. So that's the fundamental challenge with database refactoring is that coupling. I assume that that's the case. So when we wrote the Refactored Databases book, we did it under the assumption that there was 100 systems accessing that table. So whatever it is that you were going to change, there's 100 things accessing it. You have no control over 99 of them, right? They're totally out of your control. You might not even know they exist. And I would assume they don't know, there's always somebody that's written a spreadsheet that accesses right there's always something right um because summer students yeah love them and or delight consultants uh you know somebody's fooling around guarantee so anyway so you can't assume that you know well first of all you have to assume a bunch of stuff's accessing it you can't assume you know the exact list and even if you did it wouldn't matter you can't assume that you can you can update 100 things at once right and then roll them at once and if that's even remotely possible for you okay fine i'll just up the number to a thousand or ten thousand right just i'll just ratchet it up until it blows you away it doesn't matter right so you got to assume that you're you're out of luck. You can still do it. Um, and if the database is responsible for its own integrity. So, and that's, you know, it's got to keep the data up to date, regardless of the refactoring that you've made. Now, eventually you do have to, you do want to update those apps. You do want to release them and you do want to remove the cruft that you've added to keep everything up to date. So there's a little bit of work there to be done, but it's possible now. Right. So you can safely refactor a database if you want to. And then and so why is this? Why am I harping on this? This is the fundamental skill you need to fix your data quality problems in your organization. because all this other data cleansing stuff that like the data scientists do uh you know at the point of usage if you're cleansing data that's better than doing nothing but that's like a band-aid on a on a sucking chest wound it's the wrong place to be fixing things yeah you need to fix at the source otherwise you're wasting your time and you also need to fix whatever it is that's messing up your source to begin with so there's a little bit of work to be done to clean up your technical debt nice
Speaker 1
you're going to do it because that technical debt's just going to get worse that's
Speaker 2
a great point and just on a specific example sounds like the same mindset when updating public web services or even just a C Sharp or Java interface.