Speaker 3
The funny thing about IBM is that they often seem to have the vision of what's coming in the future, but then the execution is like, and so I have a good friend who worked at IBM for a long time. And this was clear back in 1990. So I was just like a teenager back there. Back then, and this was before the internet had really taken off in the United States. And he's like, yeah, what IBM sees happening in the future, we're mostly going to be going to a client-server model of computing. In other words, people are going to access centralized back-end network services, and then they'll experience that on their local computer and they'll somehow, he didn't call it the web or anything like that. But like he basically was talking about the web and mobile devices and all these things that weren't in the future. But how big a slice of that business does IBM have now? I remember
Speaker 2
playing the Watson Jeopardy game. They had it at a conference I was at. My sister and I played it. I thought it was cool. I mean, it was a good party trick for sure. But it obviously didn't
Speaker 1
I'm going to tell you the success that they had on Jeopardy was amazing. But what they were doing on Jeopardy was not the same problem that people are solving in business. By the way, speaking of that Jeopardy thing, I happened to be friends with Roger Craig. I don't know if you know that the two people that played the Watson machine were Roger Craig and Ken Jennings. And one day I was at speaking at a conference in New York City, and there were, I don't know, 500 people there or something like that. And at the end of the conference, this man came up to me and shook my hand and said, I really enjoyed your presentation. And it was Roger Craig. And he's in our, I don't know if you noticed, he's in our profession. He's a data scientist. And I've had several dealings with him over the years. A very nice, very smart person. And obviously a very smart person. That's cool. Interesting.
Speaker 2
Kind of switching gears a bit. Scott Taylor has a question here. What do you think of the whole debate with data mesh movement with the data mesh movement that seems to say data warehousing is
Speaker 1
dead? Well, people have been trying to kill data warehousing for a long time. I wish them a lot of what, let me tell you when data warehousing is going to die. Data warehousing is going to die when people need no longer need believable information. If what you want is just to get information, to access information, then you don't need a data warehouse. You need a data mesh. But if, in fact, what I tell people that bring up this argument about data mesh, I said, you ought to write all of your systems in Excel. Because Excel is a good way to get a lot of information quickly up on the computer. You can share that information. That's wonderful. Can you believe the information? Listen, I can take an Excel spreadsheet and assign myself a salary of a million dollars a month. And that's, it's on the computer. It's on an Excel spreadsheet. So why shouldn't you, and I can share that with you? Not a problem. Is that reality? Nowhere near is that a reality. So as long as people need believable information on which to make decisions, they're going to need a data warehouse. The people at data mesh, I think it's very unfortunate that they've chosen to think they're going to replace data warehouse. If they had said, gee, we can coexist with data warehouse. We can operate in a complementary fashion. I think that would have been a really good thing to say. But let's take a look at what happened with big data. Big data came along, Clodera, and there's other companies that I can't remember who their names are today, came along and said that, gee, we're going to replace data warehouse. And quite frankly, Clodera, I recommend Clodera all the time, but not for what they think I recommend them for. If you want to use Clodera as a secondary source for storing data that has low probability of access, that's actually a wonderful extension for a data warehouse. But no, no, no, no, no, Clodera is going to replace data warehouse. And that didn't work out. Data mesh is going to replace data warehouse. That's not going to work out. So I think if people would come along with the attitude that we're going to operate in a complementary fashion, that's going to be much better. I don't know what it is about data warehouse that everybody wants to kill or replace data warehouse. And data warehouse is still standing today. Yeah, actually Scott has a good follow up comment here. Scott and I talk a lot about
Speaker 2
zero-sum stuff. And it shouldn't have to be a zero-sum conflict. With anything, right? I've seen this over my career too, where I think I'd like to get your take on this too. I know you have a very specific definition of a data warehouse. Very specific. We talk about this actually. And what I see is that it seems it's been co-opted quite often. And I'm not sure. The term is from a rounder level. What do you think about that? Well, let me tell you something. The definition of
Speaker 3
data warehouse has
Speaker 2
never changed. And people,
Speaker 1
people, terror data has come along and try to think about data warehouse. And it has never changed. And people, people, terror data has come along and try to fiddle with it. And data mash and Lord knows Ralph Kimball's come along and tried to change the definition of data warehouse. A data warehouse definition still stands today. And by the way, it amazes me that that's been the case because when I wrote the definition of a data warehouse, I just wrote that off the top of my head. And I mean, I had no idea that it was going to cause the commotion that it's caused. I'm not kidding. I mean, I said, okay, what's the data warehouse? Well, and this is in the early days. I know a lot more today than I knew in the very early days of data warehouse, but the definition hasn't changed. And what can I say? Again, if people want believable data, they need it. If you don't care about believable data, if you want to just take everything off of an Excel spreadsheet, then be my guest. You don't need a data warehouse.
Speaker 3
I think this is part of the reason that the data warehouse has attracted disdain. You're kind of emphasizing the business process aspect of like, we need to make sure that the data is correct before we provide it to other parts of the organization. Yep. And I think that's very valuable. I think when people have gatekeepers and they can't then do their own things with data in parallel, they get very frustrated. And so that was kind of what motivated that discussion of like, well, that's probably where I should kill the data warehouse because we are getting frustrated with change. So, we're going to be able to do with change to taking six months to happen. Whereas we can have both systems and we can have data mesh and data warehousing say, here's the very dynamic data coming out of different application development teams within the company. But then here's kind of the golden assessed data that the whole company can look at that's governed. That actually this is really cool that this idea of running these two in
Speaker 2
parallel. Well, that's also kind of reality, right? It is. Yeah. So, Mike Rogoff. Yeah, it's a good question here. What in your view is the most effective way to scale making the data believable?
Speaker 1
Well, from the beginning, we've never said build your data warehouse all at once. Go into your organization, do an assessment as to what your most critical data is and start with that and then work your way to the next most critical data. And so you need to build a data warehouse incrementally. And in the early day, they don't get this much anymore, but in the early days of data warehousing, people would come along and say, oh, you've got to build. You've got to boil all of the ocean at the same time. We never said that. I don't know where that came from, but that would just people wanting to not build a data warehouse. And so you need the to be strategic about it. You need to be incremental about it. You need to be strategic and incremental and take it a step at a time. There's an old saying, how do you eat an elephant? If you try to eat an elephant all at once, you will die because you'll choke the death. You eat an elephant a bite at a time.
Speaker 2
Yeah, I think it's it tends to be the kind of the tendency though, right? For I would say, especially IT led projects, kind of bring it back to IT. You know, a lot of these seem to be sort of, I had to use the term make work projects, but it's definitely something where people feel like, okay, we got to take this initiative. We got to, as you say, boil the ocean and then it. I don't know. I think it just ends up in kind of a graveyard. And another.
Speaker 1
In order to build a data warehouse, transformation of data is a necessity. It is simply the price you paid for a data warehouse. So what the IT department would do is go and find the most difficult transformation that they could find and say, look, we can't transform data. You can't take this process and automated or transform it. And they were right that some processes are so complex. You really can't do anything with it. But 98% of the processes are simple things. And so the IT department tried to use this excuse. Well, you can't transform everything right off the bat. So therefore we can't do it at all. And that was a, that was a crummy little excuse. Again, the IT people went to when data warehouse first started, the IT people resisted data warehouse as hard as they could. It was the people in marketing sales and finance that that helped data warehouse.
Speaker 3
And it's interesting that the data warehousing has so often been under a completely different silo part of the organization separate from IT. And I think now that data has become cool again, we do see organizations trying to combine them more like hey, IT wants a slice of that because it is cool to be working with a lot of data where it needs to be like yeah, that's just for reporting. We don't want to deal
Speaker 1
with that. Well, the data warehousing for the first five or maybe 10 years of data warehousing was never supported by the vendors IBM went out of their way to try to not support data warehouse Oracle wasn't any better Microsoft wasn't any better. So people listened to their vendors and their vendors did not like data warehouse. And so that's that so data warehouse has succeeded despite the best efforts of IBM Oracle SAP Microsoft and all of those people.
Speaker 2
I remember those days to you know, I remember the especially the big day to days it was like SQL is going to die. Yeah, data warehousing
Speaker 1
won't go on and yeah. And here
Speaker 2
we stand today. And then has another question here. What are your thoughts on dbt and transformations moving towards being mostly in the data warehouse I think to extend this conversation but I think it's more of an
Speaker 1
ETL or ELT approach. As you probably know I'm not a big fan of ELT. Why am I not a big fan of ELT? It's because because vendors want something that doesn't require the brain and doesn't require work. They want the cheap fast, easy way out. And so what they discovered was gee if we just do ELT we can do the E and we can do the L and we'll just conveniently forget to do the T and and that has been the vendors way to sell a lot of stuff. But it's not a data warehouse that they're selling you do an E now now there are a few people out there that actually do the transformation part in ELT and for them, God bless them. But most people do the E and L and said oh we're through and we won't do any transformation. So now I will say this much transformation can occur in lots of places. It doesn't have to be at where it's currently located. You can do transformation as you bring the data in as you stream it in. You can do transformation in other places. But at some point in time you are stuck doing the transformation and transformation work is dirty work. You get your hands dirty doing transformation. And as far as I can tell there's no way out of that. There's I don't know if somebody's got a good way to do transformation that's easy and fast and cheap. Please let me know because I want to know about it. But I've not found one yet.
Speaker 3
Yeah I mean transformation fundamentally is a quality problem that comes down to a lot of attention to detail. I mean yeah anyone can write a quick SQL query to do a transformation but to make it consistent cross business logic and to make sure that the data is
Speaker 2
correct. That's also why we always talk about two transformations where you start getting our layout of your data and start getting the value
Speaker 1
out of that. That's exactly right. It's that transformation process that is the shaping factor for a real data
Speaker 3
warehouse. And I was going to say with with ELT I think there are kind of two main definitions. There was the original data like definition which was transform on read basically like transform on query which to your point bill. It's going to be completely inconsistent if you do that. And then there's the other version which is actually do the transformation in the data warehouse but it's really just ETL just with a management tool that hits the data warehouse rather than somewhere else and sometimes some people call that ELT but then you start getting so many accurate and sweating.
Speaker 3
feel L L. I like ETL
Speaker 1
because ETL forces you into discipline of transformation. Having stated that I know a couple of organizations that do ELT properly and again God bless them.
Speaker 2
We were talking about this too. I think especially at the rise of streaming and the increasing popularity of it. I think you're actually going to see transformation becoming back in a vogue because I don't know how else it would really work.
Speaker 3
So you're talking like external transformation before it even reaches stage because you want it to be semi real time.
Speaker 1
I hope you're right. But that goes against human nature human nature for a designer of a system says, gee, I'm the designer of a system. I can call this and calculate this however I want. And when you talk about transformation, you can't do that. You take away that that that prerogative of the developer and developers don't like that. Taken away.
Speaker 2
That's a fair point. I actually got cut a few people asking about this article that you just wrote. Let me see if we can share it real quick here. See here on snowflake here. Snowflake critique for full disclosure, turn area data is more than snowflake, but they like us because we're we're candid. So, yeah, what a lot in the chat here. There's been a lot of commentary and I over when this article dropped a few days ago, a lot of friends are texting me like, hey, did you see bills new article? What are your thoughts on this? Well, why don't you tune in on Monday and we can talk to people about this article. Yeah.
Speaker 1
Well, surely. Let me tell you what caused the article to be written. Snowflake, as far as I can tell, is a general purpose database management systems. Can you build a data warehouse with snowflake? Yes, you can. Can you build other kinds of systems with snowflake? Yes, you can. That's my understanding. If I'm wrong about that, please let me know. The problem is, is that snowflake advertises himself as a data warehouse on the cloud company. And so here's so day one, somebody reads it, aha, snowflake data warehouse on the cloud sounds good to me. They too, somebody comes along and builds a data mark or some other thing. They three, the people become very unhappy with what they build. They for data warehouse gets the blame that people don't recognize that they can build something far different from a data warehouse with snowflake. And what irritated me, was I irritated when I wrote that article? Yes, I was. And that it probably comes across in the article. What I'm irritated with is, I'm tired of people going out there building a piece of junk and then data warehousing get the blame for it. Consulting firms are especially guilty about this. I'm not going to name the names of consulting firms, but we all know who they are. And consulting firms go out there, read a few buzzwords, say, I can go sell my services now for a high amount of money. They go out there and build a piece, excuse my language, a piece of crap for people that's out there. And then when the thing falls apart and fails, they say, oh, data warehouse doesn't work. Well, I'm, I don't like that, that, that they never built a data warehouse in the first place. And yet data warehouse is the, is the blame. It's like your neighbor robbing the bank and then the police coming and arresting you for the bank robbery. It's like, wait a minute, I didn't do anything. I wasn't even there. And, and, and all but you're, you live next door to this person. So you must
Speaker 3
be guilty. So, so your critique is basically the same critique that you would have for any technology vendor of the MPP systems say a critique you would have of terror data oracle or anyone else that the technology itself doesn't give you a data warehouse. The
Speaker 1
other technologies don't advertise himself as a data warehouse. They advertise themselves as something as a full scale database management system. Let me tell you something. If Snowflake would have said, we're not, we're not a cloud data warehouse company. We are a cloud database management company. I wouldn't have a problem at
Speaker 2
all. I think actually some of the response here from snowflake from Clark here. He says it snowflake does not advertise as a data warehouse in the cloud. So, it's the data cloud.
Speaker 1
So, but I've kind of read the marketing for snowflake and that's kind of the impression that I
Speaker 2
got. Okay. Yeah, it's interesting and it's. I think it's kind of goes back to to the, you know, we're talking early about sort of the appropriation of the term data warehouse.
Speaker 2
I mean, is this is this just you see this is one more example of that were. Yep.
Speaker 1
Okay. Interesting.
Speaker 2
Yeah, interesting. What are your thoughts on this map?
Speaker 3
Yeah, I mean, in general, I agree, no particular technology is going to give you a data warehouse data warehouses are about processes. So. Yeah.
Speaker 1
Data warehousing is an architecture. These other technologies are technology and there's a fundamental difference between an architecture and a technology. And I think as far as I can tell, there's always going to be that kind of difference.
Speaker 2
Yeah, that's interesting too. I mean, because as over the weekend going back and reading through a building of the data warehouse to finish up a couple spots in the book and I think that's absolutely right. It's the way you described it was very, I would say technology agnostic. It's more of a paradigm like this is practices
Speaker 2
that's interesting. What are your thoughts on the lake house?
Speaker 1
Um, the lake house is kind of interesting. We started with data lakes, which I am. I don't use the word hate a lot, but I really don't like data lakes. I, I, I, data lakes are a staging area. And the expectations that people put into data lakes is far exceeds anything that a staging area could ever accomplish. So a data lake house is an extension of data lakes with the infrastructure that you need for making analytical decisions. And so I see a real distinction between data lakes and data lake house. I think data, data lakes are, are a travesty. And I think that a data lake house stands a chance at giving an organization what they need for making analytical decisions.
Speaker 2
What do you think a lake house architecture fits in with text data, for example?
Speaker 1
I think text data fits in like a hand in the glove with it. I mean, I think in the recent book that I had out on data lake house, we said there's three components to a data lake house. There's structured data. There's textual data. And there's analog IOT data. And those are the three kinds of data that we see going into a data lake house.
Speaker 3
Yeah, I mean, the big difference is not just that you, I mean, to me, is not just the structured data, but you have like a good layer from managing structured data. Because even in the data lake area, you could throw lots of structured data into your system. The problem is that it was so hard to track scheme, things like that, what's out some tools around it. Managing schema changes like you managing any kind of updates with a huge nightmare back then. That wasn't really happening. And then suddenly people like, wow, these other MPP systems support, you know, schema changes really nicely, actually enable creation updates and. Yeah, pretty valuable. Yep. Yeah, that's interesting.
Speaker 2
Cool. I think we're coming up on time. Anything you want to leave the audience
Speaker 1
with? Well, yeah, I got an interesting email the other day. And I normally don't take emails personally, but this one, the email said, Bill, who is paying you? What vendor is paying you under the covers to make trouble for snowflake? And I want to make it clear. There's no vendor out here that's paying me. And I'm not being influenced by any other vendor that the only motivation I had for writing the article about snowflake was to make sure that when you fail with a data warehouse, you go check out the meaning of what a data warehouse was because with snowflake, you can build something. I don't know what that isn't a data warehouse. Don't blame data warehouse. That was the motivation for that. But I'll be honest with you. I kind of resented the fact that people, somebody thought that I'm on the take from some other vendor because I, you know, you ask my account and you will check my banking account and find out there's no vendor out there paying villain, villain, and I'm not going to
Speaker 2
do this. That's really interesting. How do you think we could better educate people on the data warehouse and what it means and what it's intended for? There's a lot of
Speaker 1
confusion. I've certainly done my part and the confusion, I think, comes from vendors wanting to co-opt it. I think vendors want to say, oh, that sounds like my technology. If we just change this and this and this, we can call whatever I do a data warehouse. Well, the problem is the things that they change are robbing a data warehouse or being a data warehouse. And so I think the confusion is caused by vendors wanting to jump on the train. And I think it's really ironical. You know, we mentioned IBM and I don't want to dwell on IBM, but IBM thought Ted Codd and other people at IBM thought data warehouse tooth and nail. And in terms of selling IBM hardware and services, data warehouse has probably sold more than anything you'd ever imagined. And so I thought this is really funny because IBM did not support and still doesn't support data warehouse, yet they make a lot of money off of it. I hope in my life someday I reach the point where there's something that I don't like that I get rich from and still don't like it. I hope that happens. I doubt that it will, but you never can tell. Interesting.
Speaker 2
It's kind of closing out to I mean, how many books have you written?
Speaker 1
Nonfiction books, 63 fiction books to wait. You wrote fiction books? Yes.
Speaker 2
no idea either. That's really cool. I'm just as
Speaker 2
like, but that's a lot of books. I'm trying to do that the math and that, but that's at least two years. Yeah. What keeps you going on that? Well,
Speaker 1
let me tell you something. I come from a family that's a writer. My sister has written 28 books. My father wrote 10 books. One of our years ago, many years ago, family members was Edgar Allan Poe. He was my great, great, great, great, great grandfather. I had a niece that was a screenwriter in Hollywood. I come from a family of writers.
Speaker 2
That's interesting. I had no idea. That's cool. It just must be in the blood then.
Speaker 1
You know, some people play golf. Some people walk their dog. I write. I find writing to be completely relaxing and to me. And I know that for most people writing is work. For me, writing is a pleasure. And again, it's my hobby in life.
Speaker 3
I mean, I think I enjoy it too. It's more fun when you're not on deadline. Not editing your own work and those kinds of things, but that's true. Yeah, necessary things to actually turn writing into a product into something that other people
Speaker 2
can consume. You must have published these days or do you still go through
Speaker 1
publishers? I've always gone through a publisher. And I started off with Prentice Hall. I did then. And then I can't even remember the names of all of them. Millen, I think, did one of my books. And now I'm with Technics, Steve Hoberman and Technics. I don't know if you know, Steve Hoberman, but you should. Steve Hoberman is first off a great guy, but he's got the largest collection of. Of authors for technology of any company. And so you should be in touch with Steve Hoberman.
Speaker 3
Interesting. That's cool.
Speaker 2
Yeah, I'll have to pick your brain about the writing process. It's something Matt and I are still learning. I mean, I think we made it work. Right. We've got in the. Yeah, today's actually literally in about two hours of the deadline for. Yeah. Yeah. Sounds good.
Speaker 2
awesome. Well, thanks, Bill. It's always good to chat with you. Okay. Yeah, looking forward to catching up soon. So thank you. Talk to you guys later. Yeah. Thanks. Sorry. Bye. Bye.