Speaker 2
It actually reminds me of going back to what we talked about before about open telemetry. So how many vendors, you know, AppD, Dynaprace, and all have been investing so much money about the agents and doing all the proprietary closed source agents to do all the clever secret source, charging tons of money on all of the agents. And now that you have open telemetry and that's table stakes and it's open source, and they can focus on the backend analytics power, you suddenly start asking, why did we pay so much money all these years to just collect telemetry, just raw data, just get into raw data
Speaker 1
in the centralized space? Yeah, no, exactly. I think there are a lot of parallels there. It's unfortunate. It's
Speaker 2
funny. By the way, you mentioned before. already in companies definitely the bigger size fails what what do you see when you talk to your customers other members of the community about platform engineering how does that influence our
Speaker 1
yeah no that's a great that's For a while there, it really just felt like everyone was renaming their DevOps team and their backend engineering team. So like platform engineering team. And it was just like this again. When I think about what really matters for platform teams, you know, it's, know, I talked about that, that line between your infrastructure code that you have to run and your code. I think that like really smart platform teams are the ones that own that, that boundary, right. And bring a very product focused lens to what has historically been, you know, ops and, and, and sort of back-end things. think that, I think that those, those are, those are, they're, we're getting more permutation of, of those concepts throughout the industry, which is really good. Like a big part of getting out of reactive mode and then starting to build things affirmatively is, you know, having a product mindset when you, when you apply them to, you know, your platform is, is, is both a cause and an effect of getting better at that. Um, a lot of teams like, and I, I'm a little jealous. Like I feel like I was like, my career was a little too early. Like I never really got to work with great product folks when building things on the backend. And I, and I watched this enviously because I feel like it's such a powerful skillset. Um, and it's really exciting. Yeah. I feel like platform engineering is growing up a little bit this year. I do feel like there's still so many folks out there who are just selling these, what do they call them IDPs, internal developer platforms. And I went to, I'm like, that's not the point. Like, the point is not that you can sell someone to do it for you. the point the point is that you know you you own that that layer that that abstraction layer right it's a force force multiplier and a force amplifier and i don't know which is legit platform you know products to sell but like i do feel like like every term it gets co-opted to try and sell people things when i really feel like the heart of it is just building software like product engineers do, but for that sort of point.
Speaker 2
I love the definition. By the way, I used to be a product manager in past life, so I definitely relate to that. And actually, part of the work that we do we're doing these days in the CNCF and CNCF under the tag app delivery the technical advisory group for app delivery there's a special working group for platforms for platform engineering and we're now running a research on actually platform as a product actually we call actually I'm calling on all our listeners out there to have some platform practices inside their company. Actually, come and share how you do a platform. How do you define, how do you collect the requirements? How do you prioritize the requirements? How do you collect feedback? How can your users, the platform users, can feed in requests for a platform? How do they get support for that? Do you have a portal? Do you have a CLI you have documentation how do you maintain like all these questions to really know so really uh maybe i'll paste it here for the folks who want to listen in and again if you if you want happy actually i did an interview this uh this morning my time europe time uh for for a member that is a consultant who's done that in many companies, interviewing him about how they do that. So I think it's fascinating, and we're just starting to understand how it's done in practice. Everyone knows how they wish for it to be done. I have very strong opinions on how I think it should be done, but let's start by understanding how it's actually done in practice, and you'd be surprised how many how many organizations they don't have like a product or proper product owner and it's spread across different like engineers and they don't like feedback systematically they don't measure systematically and collect metrics on what they release so but
Speaker 2
what we need to achieve so i call it these
Speaker 1
are for a lot of teams. The other thing is that I feel like an anti-pattern that I see a lot is people are like, okay, we're spinning up a platform engineering team or renaming a team to platform engineering, whatever, and therefore our problems will be solved by this team. And it's a piece of the puzzle, right? But I actually feel like the rise of platform engineering is very, very much part of, I feel like we're in the sunset of the DevOps era, which is not because DevOps, the like, the philosophy of like collaboration and breaking down silos, all that stuff, like, you know, DevOps is eternal. But like the original idea of DevOps being that like you have ops teams and dev teams they collaborate. We don't do that anymore. You know, like, or at least that people know people are spinning up new engineering orgs and like, all right, let's hire an ops team to run the code and devs to write the code. You know, like we have all like accepted that it's a really bad idea to expect half your engineers to write the code and the other half to understand and operate it, right? Those things have to coexist in a brain, right? So increasingly, like all you have are engineers who write code and operate and own their code in production. And there are places for specialists all over, right? Like nobody's saying everybody has to understand and do everything. But a really important part of this process is finding someone, something to own that layer of abstraction between the infrastructure, you know, the commodity. Infrastructure has been commoditized, which is really exciting. And then the code that is your crown jewels, right? The code that makes you, somebody's got to own that interface. And so, but in order for platform engineering teams to do that, to do it well, to do all these things we were just talking about, a lot of other stuff has to happen, right? A lot of other things have to happen in the organization. You know, software engineers have to feel like it's their job to understand and operate their code. They have to, you've got, have to have like on-call rotations that close those loops and make them faster and tighter between people who understand what's happening and the people who are responding, right? You have to dig yourself out of a real hole of operational debt if you're a lot of companies, because the systems are just too noisy for developers to actually get any work done and operate, right? So you have to get to a point where people aren't drowning in alerts and noise and pages. And to kind of like tie this back to observability, like I'm super blind and I really think of like observability as being like you put on your glasses before you go barreling down the freeway so that you can see what you're doing. I feel like when you front load observability, all the other investments you need to make in your system go faster and easier and you move with more certainty and more purpose because you can make these course corrections and adjustments. Like the goal should be when you're driving, you're not sitting here thinking about course corrections. You're just driving, right? Because you can do it almost automatically. And the goal of software is to be able to do the same thing, for it not to feel you're constantly reacting and responding and just building, just like interacting with the customer, just creating value, right? And like doing the thing. But in order to do that, you have to have good observability throughout your system. Everyone needs to understand how to use it. have to have instrumentation you have to have you know there are all these things and and like it sounds like a lot and it is but I do feel like whereas there are a lot of investments where you need to toil away and do a lot of shit before you can really reap the payoffs observability is one of those where I feel like every bit that you make better it's tangible like it's palpable. Like it pays off. You can speed up. You can tighten your feedback loops. You can, people feel more empowered. People can, you know, people can move with more confidence. Users get happier. You can find things before your users do. You know, I really feel like, you know, we're all so busy. We're all drowning in things that we need to do. People, we have stakeholders, we have, you know, laundry list of things that we need to do. And for most people, front loading some observability is the best way to accelerate the rest of your roadmap. And
Speaker 2
what you said, I think it's also important in terms of the fact that, and this is also what I advise my customers and users who have knowledge to do it in baby steps. every step you make is very tangible. Don't try and, you know, boil the ocean on day one and do the perfect observability that you saw on the latest blog post by charity majors or Google or Uber or whatnot. No, take it, baby steps, focus on the critical paths that you have, the critical business clothes that you have. Yeah, exactly. And then you can really see that value. And then incrementally add more where you see the pockets of black holes that are really impactful for your system and your business goals.
Speaker 1
Yeah, because every bit of progress you make, it earns you credibility, right? It pays off, right? And that buys you more leeway to do more things and bigger things. Yeah.
Speaker 2
By the way, one thing that I also found interesting, you know, even when we were at the Open Source Observative and I looked at the announcements by Victoria Metrics folks now going into logging with Victoria Logs and they're now launching their own VSTL for querying logs and, you know, a very similar process like the log QL for Grafana and others, we do really, the question, do we really need those languages? And for me, it feels like we're diverging from the vision of unified query language and unified observability in that respect. So I'm wondering how you see this tendency of specialized query languages by, by different vendors. You
Speaker 1
know, I get it. And I also think it's not great. Uh, you know, we've held the line against doing this and I, and I think that we will, because we really, there are a place for query languages, but they're, they're a power tool for power users. And typically if you need them, you're working around some real limitations in your actual product. You know, I feel like, you know, the core fact of observability 2.0 versus 1.0 is that unified storage versus many sources of truth. But there are so many other things that go along with it. Being able to interact, sort of interactively like pull in a triangle what about this and what about this and just like follow the trail of breadcrumbs is I think a workflow that I that is part of observability 2.0 where you're not just like running a report you're not just you know also the human eye the ability to visually correlate things is one of the most powerful it's one of the most powerful tools like there are so many things that you'll never notice in a wall of text that will jump out and just like hit you in the face if you're visualizing it. So like I get it. You get a certain amount of data, you get a certain amount of maturity and people start to clamor for this. But I think that when you look under the hood for the reasons people are using it, they're typically pointing to pretty big weaknesses in in the product of the model yeah
Speaker 2
and by the way it's very interesting because for me i've been talking for a good few years i think i even i even try to reach out to the data analytics community to try and bring them together with devops so i i posted something on the back then it was called inside big data i think since then they rebranded to Inside AI or something, but it's a designated newsletter for the data analytics folks. And I asked them, like I pitched the idea that observability is a data analytics problem. And I built the full case of showing on the collection, on the enrichment. You know, if you look at the classic data analytics from other domains, that's where we are. And the focus on the raw data that we have, what you call observatory 1.0, which is essentially the raw data, the logs, the metrics, the traces, the profiling, whatnot, this is just the raw data, and we're interested in the insights. We're really interested in understanding what goes on in our system. That's a classic data analytics problem. If you take any data analyst from 10, 20 years ago, they've been doing that on Power BI or or not, because that's the type of things that they try to get extracted out of data, regardless of observability and DevOps. So I think we need to start looking at our profession as data analysts in that respect.
Speaker 1
You know, they've had, this is what, everything I'm talking about with Observability 2.0, they've had this for 20 years on the business side. Like, this is how, they look at the tools that we use for software and like, why are you doing, you know, it's just like, it's very like the cobbler's children have no shoes. Like, we have not had, like, they've had, can you imagine trying to run a marketing department if you had to predefine the buckets for like you know cohorts in advance and collect those metrics and just be like well I'm guessing they're going to be in the no you can't it's insane people will be like what are you doing you know columnar stores which are I think another really important thing that powers observability 2.0 tools because it gives you the ability to slice and dice on any dimension it gives you ability to have high cardinality dimensions it gives you the ability to zoom in and zoom out you know raw events and then big picture you know vertica came out like 20 years ago dude but like like they're doing all these things they've been doing these, the data analytics side for years. And we really are just kind of playing catch up when it comes to some of this stuff. And in a way, that's a little bit embarrassing. I
Speaker 2
think much of that is because we've been spending so much time on the data collection, on the agents and getting the telemetry in. Whereas now that it's a solved problem, thanks to open telemetry and other things, now we can focus on the backend analytics. That's the thing that matters. And that's where you get the differentiation. As vendors, I hope vendors realize that now. Yes. And
Speaker 1
I feel like another reason that we're still kind of lingering, that we've been doing that for so long is because we have this habit of thinking of tech in terms of tech instead of in terms of the business, right? Instead of in terms of business value. And one of the things about Observability 2.0 that's a little bit more subtle but that I think I really hope takes off is that I feel like when you're using these really wide events, you can pack all this context in, blah, blah, blah, blah, blah. Starting to merge the lanes of business data and tech data, you know, like instead of just being CPU and memory and stuff, and then, okay, app IDs and user IDs and stuff, but like being able to have all of the interesting hard questions you need to ask about your systems are some combination of application systems and business use cases, right? Being able to blend that data and ask complex intersecting questions about things like, you know, shopping carts or, you know, usage patterns or, you know, like these are so powerful. One of my favorite, this is a, this is a very long ago, uh, honeycomb anecdote. Uh, Intercom was one of our first customers and they added their high cardinality dimensions to the honey, their honeycomb events, you know, and they're just like poking around. They were on the verge of kicking, they had outgroped their database, their MySQL database outgrown the largest EC2 instance size. They're like gearing up to do this like giant ass migration to like sharded databases and blah, blah, blah, blah, blah. One of their engineers happens to throw some of these IDs in. She's just like screwing around, looking around. And so they went, oh, shit. Like 80% of the application execution time is being used by one app who's paying us 20 bucks a month. So we could either do all that or we can rate limit this guy. And we're just like, you know, and it's like, you can't predict in advance where these insights are going to come from. You have to be able to play with the data, but being able to play with the data and have it all be in one place, it's so powerful. Yeah.
Speaker 2
And by the way, it's funny because I had something similar that I even wrote about it something a few years ago. I called it self-observability. I talk about actually, even if you're like an entrepreneur with your own startup and you're trying to view now your pricing and the PLG, and you need the metering of your application, right? And to support your self-service and knowing you released a new feature, does it work? All the fail-fast mentality, the frequent releases, how do you do that without observability? And that's essentially observability. And the good entrepreneurs, they sit on these dashboards day in and day out. They start, they, they, they, and understand, hey, we released this new, probably a few times a day that they release, and they see the impact of each release. From the change of color of a button all the way to a new pricing model. How would you know otherwise how to do that? that that's like from the product to the to the manager to the to the the again the the entrepreneur or the head of business unit or what not you need observability that's like across the board so i know when i do episodes about the product or about the finops or about people ask me but your your show is about observability. I tell them observability is not just your IT in production. That observability is everything that you do, that's observability. You need to start with that. That's the zero level requirement you need to start with. And
Speaker 1
this is such an interesting, this is such a fascinating and important point. I feel like one of the sort of like ripple effect, like there's so many other things I think that are associated with the shift from 1.0 to 2.0. One of them is that I feel like observability still has this reputation of being, it's about how you operate your code, right? It's how you run it. It's how you understand what you've done. It's how you do MTTR and MTTD and like, you know, when it breaks, how do you get it back up and all this stuff? And it is that, but it's not just that. Like, Observability 2.0 is more about how you develop your code than just how you operate it, I think, which is that fast feedback loop of, like you said, is this button working? You know, how are these colors doing? How are people engaging with the stuff that I've written? Are they doing it in ways I expected? Are they doing things that I didn't expect? Are those things better? Should I invest in those things? You need to be in constant conversation with your code. The remit of understanding is so much bigger than bugs and outages and downtime. It's like, there are so many ways in which you just need to understand the impact of what you've put out into the world in order to really make it in order in order to make your users happy in order to build better products in order to like, not waste your time going down rabbit holes in order to decide what to work on in order to decide if what you're doing is having the impact that you want it to. You know, there's so much there that is really like observably 2.0. I think it's just like the foundation of all those fast feedback loops that you need to hook up in order to really just run circles around your competition. Sounds
Speaker 2
good. And actually, we have an interesting question from the audience, from Daniel here. So I'll just read it out because we also have the podcast listeners. So if you've got zero observability today, would you be starting off with 2.0 resources to be a good idea? Or would you still start with your book that came out in 2022, which is essentially observability 1.0 in that respect?
Speaker 1
know uh although we didn't use the one it's so funny you you ask we just started working on a revised version of the book uh it's gonna it's gonna you know but actually i think the stuff that we were describing the original one was we were trying to describe 2.0 concepts you know having having the unified having high cardinality and high dimensionality and stuff so if you're trying to, I think that the concepts are still, I think we've gotten better at explaining them. I think that we've gotten better at attaching language to them and helping people sort of differentiate. But the concepts haven't really changed because we were trying to lay out the principles and what we eventually called 2.0 there. The first two chapters are a little repetitive. We apologize. We're fixing that in the next one. And we're also trying to add more case studies and stuff that talks about AI and stuff. But the concepts still hold. So
Speaker 2
if you just go back to make sure that Daniel got so, if he's now starting on a green field, day zero, he has no observability or whatnot, what would be your recommendation now with the learnings that you've gathered today? What would be the good starting point for them?
Speaker 1
I'm going to drop a link here in our private chat. Maybe you can drop it in the other one. Start with this blog post that our friend Jeremy Moral wrote. It's called A Practitioner's Guide to Wide Events. And he basically, this is a guide to how to instrument your code in a way that, and he also, it's a very, like he's a honeycomb user, or he was at his last job, but he also talks about, it's pretty vendor neutral. Like he talks about open source ways of doing this and other vendor options and stuff. But like it starts, it starts with gathering your data in a way that preserves context, right? Preserves these relationships that says that like when this thing, like the thing about metrics is you discard all the context at right time. So if you've got two metrics, you can never, you can never again like verify whether or not they actually apply to the same event or not. Like you just can't do it. I wrote a white paper earlier this year called the bridge from observability 1.0 to 2.0 is logs, right? I feel like the center of gravity for our telemetry over the next couple years has got to move away from metrics back tools and towards structured logging tools. And then you need to work on emitting fewer log lines and making them wider. The wider your log line, the more context, the more connective tissue that you're preserving. And then, you know, you feed it into a, you know, whether that's ClickHouse or Honeycomb or, you know, something that allows you to slice and dice and stuff. Like, I would start here with the instrumentation stuff, because if you're, if you're, if you're instrument your code the right way and you're using OTEL, then you're not locked in. Right. You can you can. Honeycomb has a free tier. You can experiment with stuff there. I don't know if ClickHouse has a free tier, but like it starts with gathering up your data the right way and other decisions you can kind of kick down the line. So that's where I'd start. Sounds
Speaker 2
good. Let's see another question from the audience here from Jason. Where do you see observability? Just saw the great article from Pinterest on including observability in their build system. There was the AI conversation, but there are other places you see needed. I haven't seen the Pinterest article, so apologies for that. But maybe you have. I haven't either.
Speaker 2
should look it up paste it there on the article itself Jason if you can but so the question is I guess where are other places you see you see the need so I think actually we talked about some of that before when you said that observities everywhere like you see that as a product as an SRE as a DevOps as a developer as a QA engineer. I think also even as a designer, like you now design a certain layout. You want to check if it's the right one. So it's everywhere, but I don't know if you have anything that you want to add. Yeah. So
Speaker 1
I think another of it, and this might be a little obvious, but I'll say it anyway. The more instrumentation and observability you have around your deploys, the better. Things like feature flags, things like progressive deployment intersect with observability to be greater than the sum of their parts, right? to do feature flags and tweak them on and off, awesome. If you have observability, awesome. If you had the ability to flip a flag for 10% of your users and then slice and dice and group by to see what is the experience for this set of users like compared to that set of users or to canary to one node and then promote it to 10 nodes and break down by build ID or group by build ID and compare you know, the amount of granular visibility and control that gives you, you know, the whole like, should we deploy on Fridays, you know, thing that like, it's just one of those perennial questions in tech is like, it's such a, the way that you increase confidence is by giving people visibility and control so that they can do things in these very measured ways, you know, instead of just like, all right, we're going to cross our fingers and dump a week's worth of diffs on the world all at once. That's terrifying. Of course, people are scared to do that, right? Don't do that. But if you're deploying, you know, one introduced chain set at a time, you know, if you have ways of, you know, like at Honeycomb, we have production and then we have the dog food cluster and then we have kibble. So we deploy first the kibble and it gets promoted to dog food and it gets promoted to production, right? That gives us a lot of confidence along the way because we can, at every step, we can and see is it going you know i i think that like it it may be obvious but i think that is still an under-invested in area and you know so much of modern trends are about empowering engineers to own their code and making it not scary making it easy to do the right thing hard to do the wrong thing and observably a lot of, there are so many businesses out there that now they're talking about finding bugs before your users do, finding things before your users do. But like, if you're in a 1.0 world, all you have are aggregates and random exemplars, right? That's it. You can look at like the 99th percentile, maybe the 99.9th percentile, You can get random examples. But that is like you can't actually reliably find problems before your users do unless you have tools that are like scalpels, right? Where you can see, okay, I deployed it to 1% of my fleet. What happened? For who? When? How? When they did this? For these users with these feature flags on this build ID, being able to slice and dice and like zoom in. And that's what allows you to find things before your users do. Precision tooling.