AWS Morning Brief

Corey Quinn
undefined
May 22, 2020 • 13min

Whiteboard Confessional: Naming Is Hard, Don’t Make it Worse

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.Linkshttp://nops.io/snarkhttp://snark.cloud/n2ws @QuinnyPigTranscriptCorey: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semi-polite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real-world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.nOps will help you reduce AWS costs 15 to 50 percent if you do what tells you. But some people do. For example, watch their webcast, how Uber reduced AWS costs 15 percent in 30 days; that is six figures in 30 days. Rather than a thing you might do, this is something that they actually did. Take a look at it. It's designed for DevOps teams. nOps helps quickly discover the root causes of cost, and correlate that with infrastructure changes. Try it free for 30 days, go to nops.io/snark. That's N-O-P-S dot I-O, slash snark.Good morning AWS, and welcome to the AWS Morning Brief: Whiteboard Confessional. Today we're going to revisit DNS. Now, now, slow down there, Hasty Pudding. Don't bother turning the podcast off. For once, I'm not talking about using it as a database… this time. As you're probably aware, DNS is what folks use to equate friendly names for twitterforpets.com, or incredibly unfriendly names like Oracle.com, to IP addresses, which is how computers tend to see the world. I'm not going to rehash what DNS does. Instead, I'm going to talk about a particular kind of DNS problem that befell a place I used to consult for. They're publicly traded now, so I'm not going to name them. An awful lot of shops do something that's called split-horizon DNS. What that means is that if you're on a particular network, a DNS name resolves differently than it does when you're on a different network. For example, admin.twitterforpets.com will resolve to an administrative dashboard if you're on the Twitter For Pets internal network via VPN, but it won't resolve to that dashboard if you're outside the network, or it might resolve nowhere, or it might resolve just back to their main website, www.twitterforpets.com. And that's fine. Most DNS providers can support this, and Route 53 is, of course, no exception. This is, incidentally, what the Route 53 resolver, that was released in 2018, is designed to do: it bridges private DNS zones to on-premises environments, so your internal zones can then resolve to private IP addresses without having to show your private IP address ranges in public zones to everyone. So, the reason that matters is that this keeps you from broadcasting your architecture or your network layout externally to your company. Some folks consider doing that to be a security problem because it discloses information that an attacker can then leverage to gain further toeholds into your network. Some folks also think that that tends to be a little bit on the extreme side. I'll let you decide because I don't care, and that's not what the story is about. The point is that split-horizon DNS is controversial, for a few reasons, but in many shops, it is considered the right thing to do because it's what they've been doing. The internal DNS names either don't resolve anything publicly, or they resolve to a different system that’s configured to reject the request outright. But there is another path you can take; a third option that no one discusses because it's a path that's far darker, because it is oh, so very much dumber. But first…This episode is sponsored in part by N2WS. Do you know what you care about? Many things, but never backups. At least until right after you really, really, really needed to care about backups. That's what N2WS does for your AWS account. It allows you to cycle backups through different storage tiers; you can back things up cost-effectively, and safely. For a limited time, N2WS is offering you $100 in AWS credits for setting up their free trial, and I encourage you to give it a shot. To learn more visit snark.cloud/n2ws. That's snark.cloud/n2ws. What I'm about to describe is far too stupid for my made-up startup of Twitter For Pets, so we're going to have to invent a somehow even dumber company, and we're going to call it Uber For Squirrels. It's like regular Uber, except it somehow manages to lose less money. Now, there's a very strong argument among the engineering community inside of Uber For Squirrels. Split-horizon DNS is dangerous is what is decided and argued for. And that's the proclamation because a misconfiguration could leak records in the wrong places, and theoretically take the entire online site for Uber For Squirrel down. There are merits to those arguments and you can't dismiss them out of hand, so a bargain was struck. The external DNS zone was therefore decreed to be uberforsquirrels.com, while the internal zone was configured to be uberforsquirrels.net. The uberforsquirrels.net zone was only accessible inside of the network. From the outside, nobody could query it. Now, this is, in isolation—before I go further—a bad plan all on its own. When you're reading quickly, uberforsquirrels.com and uberforsquirrels.net don't jump out visually to people as being meaningfully different. You're going to typo it in config files constantly without meaning to, and then you're going to have a hell of a time tracking it down because it's not immediately obvious that you're talking to the wrong thing; you might think it's a network problem. Your tab completion is going to break out of your known_hosts file, if you have such a thing configured in your environment, it's going to have to hit tab a couple of extra times to cycle through the dot net variants and the dot com variants. It's just a general irritant. But that's not enough to justify an episode of the show. Because wait, that is still some Twitter For Pets level brokenness. Why do I need to throw Uber For Squirrels under the bus? Well, because it turns out that despite using uberforsquirrels.net everywhere as their internal domain, they didn't actually own uberforsquirrels.net. It wasn't entirely clear who did other than that the registration was in another country, so it probably wasn't something that the CEO registered and then forgot about in his random domain list of things he acquired for companies he was going to start o...
undefined
May 18, 2020 • 10min

Amazon Macie Some Well Deserved Pushback

AWS Morning Brief for the week of May 18, 2020. 
undefined
May 15, 2020 • 11min

Whiteboard Confessional: You Down with UTC? Yeah, You Know Me

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.LinksCHAOSSEARCH@QuinnyPigTranscriptCorey: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semi-polite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real-world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.nOps will help you reduce AWS costs 15 to 50% if you do what it tells you. But some people do, for example, watch their webcast, "How Uber reduced AWS costs 15% in 30 days". That is six figures in 30 days. Rather than a thing you might do, this is something that they actually did. Take a look at it. It's designed for dev ops teams nOps helps quickly discover the root causes of costs and correlate that with infrastructure changes. Try it free for 30 days. Go to nops.io/snark. That's nops.io/snark.Today I want to talk about a funny thing: time. Time has taken on a different meaning for many of us during the current pandemic. Hours seem like days. Days seem like months. But in the context of computers, time is a steady thing. Except when it's not. Things like leap years, leap seconds, Google's famous leap smear and, of course, our ever-changing friends, time zones, combine and collude with one another to make time a very hard problem when it comes to computers. In the general case, computers think of time in terms of seconds since the start of the Unix epoch on January 1, 1970. This is incidentally—and not the point of this episode—going to cause a heck of a lot of excitement when 32-bit counters rollover in 2038. But that's a future problem similar to Y2K, that I'm sure won't bother anyone. Time leads to suboptimal architectural choices, which is bad, and then those choices become guidance which is in turn far, far worse. Now, AWS has said a lot of things over the years that I despise and take massive issue with. Some petty and venial, like pronunciation, but none of them were quite so horrifying as a tweet. On May 17, 2018, the official AWS Cloud Twitter account tweeted out an article with the following caption, “Change the timezone of your Amazon RDS instance to local time.” I hit the roof immediately and began ranting about it and railing against that tweet in particular. I believe this is the first time that me yelling at AWS in public hit semi-viral status. My comment, to be precise, was absolutely do not do this. UTC is the proper server timezone unless you want an incredibly complex problem after you scale. Fixing this specific problem has bought consultants entire houses in San Francisco. Now, I stand by that criticism and I maintain that your databases should be in UTC at all times, as should the rest of your servers. And I'll explain why, but first:This episode is sponsored in part by N2WS. You know what you care about? Many things, but never backups. At least, until right after you really, really, really needed to care about backups. That's what N2WS does for your AWS account. It allows you to cycle backups through different storage tiers so you can back things up cost effectively and safely. For a limited time N2WS is offering you a hundred dollars in AWS credits for setting up their free trial. And I encourage you to give it a shot. To learn more, visit snark.cloud/n2ws. That's snark.cloud/n2ws. It's important that all of your systems be in the same timezone UTC, or Universal Time Coordinated doesn't change with the seasons. It doesn't observe daylight saving time. It's the closest thing we've got to a unified central time that everyone can agree on. Now, you're going to take issue with a lot of that, and I'm not suggesting that you should display that time to your users. You have a lot of options around how you can alter the display of time at the presentation level. You can detect the timezone that their browser is set to. You can let them select their time zone in the settings of your application. You can do what ConvertKit—one of my vendors—does, and force everything to display in US East Coast time for some godforsaken reason. But all of those options are far better than setting the server time to local time. Years ago, I've been told that this shameful secret exists within companies during job interviews when I asked what kind of problems they're currently wrestling with, and it's a big deal because changing one system requires changing every system that winds up tying back to that. Google apparently had all of their servers originally set to Pacific Coast time or headquarters time, and this caused them problems for over a decade. I can't confirm that because I haven't ever worked there, so I wouldn't know other than stories people tell while sobbing into beers. But it stands to reason because once you've gone down this path, it is incredibly difficult to fix it. What's not so obvious is why exactly this is so painful. And the problem comes down to change. Time zones change. Daylight saving time alters when it takes place in the given location from year to year. And time zones themselves don't hold still either, as geopolitical things tend to change. Remember that computers don't just use time to tell you what time something is right now. They look at when log entries were made, what happened in a particular time frame? What was the order of those specific events that all came from different systems? When was a change actually implemented? And you really, really don't want to have to apply complex math to logs just to reconstruct historical events in time. “Well, that one was before daylight saving time took effect that year in that particular location where the server was running in, so just carry the two.” That becomes awful stuff, and no one wants to have to go through that. It also leads to scenarios where you can introduce errors with bad timezone math. Now, there are a couple of solid objections here, but one of the only ones that I saw advocated on Twitter when I started ranting about it was of the very reasonable form, “Look, most stuff that uses databases in a lot of companies is for a single location at a single company, and it's never going to need ...
undefined
May 11, 2020 • 10min

The AWS Machine That Goes PING

AWS Morning Brief for the week of May 11, 2020.
undefined
May 8, 2020 • 10min

Whiteboard Confessional: Click Here to Break Production

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.LinksCHAOSSEARCH@QuinnyPigTranscriptCorey: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semi-polite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real-world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.Today on the AWS Morning Brief: Whiteboard Confessional, I'm telling a different story than I normally do. Specifically, this is the tale of an outage from several weeks ago. The person who shared this story with me has requested to remain anonymous and further wishes me to not mention their company at all. This is, incidentally, a common occurrence. Folks don't generally want to jeopardize their relationship with AWS by disclosing a service issue they see, whereas I don't have that particular self-preservation instinct. Then again, I'm not a big AWS customer myself. I'm not contractually bound to AWS in any meaningful way, and I'm not an AWS partner, nor am I an AWS Hero. So, all that AWS really has over me in terms of leverage is the empty threat of taking away my birthday. So, let's dive into this anonymous story. It's a good one. A company was minding its own business, and then had a severity one incident. For those who aren't familiar with that particular designation, you can think of that as being the company's primary service fell over in an embarrassingly public way. Customers noticed, and everyone runs around screaming a whole lot. Now, if we skip past the delightful hair-on-fire diagnosis work, the behavior that was eventually tracked down was that an SNS topic had a critical listener get unsubscribed. That SNS topic invoked said listener, which in turn drove a critical webhook call via API gateway. This is a bad thing, obviously. Fundamentally, customers stopped receiving webhooks that they were expecting, and this caused a nuclear meltdown given the nature of what the company does, which I can't disclose and isn't particularly relevant anyway. But, for those who are not up to date on the latest AWS terminology, service names, and parlance, what this means at a high level is that a thing happens inside of AWS, and whenever that thing happens, it's supposed to fire off an event that notifies this company's paying customers. This broke because something somewhere unsubscribed the firing off dingus from the notification system. Now that we're aware of what caused the issue at a very high level, time to dig into how it happened and what to do about it. But first:In the late 19th and early 20th centuries, democracy flourished around the world. This was good for most folks, but terrible for the log analytics industry because there was now a severe shortage of princesses to kidnap for ransom to pay for their ridiculous implementations. It doesn’t have to be that way. Consider CHAOSSEARCH. The data lives in your S3 buckets in your AWS accounts, and we know what that costs. You don’t have to deal with running massive piles of infrastructure to be able to query that log data with APIs you’ve come to know and tolerate, and they’re just good people to work with. Reach out to CHAOSSEARCH.io. And my thanks to them for sponsoring this incredibly depressing podcast. The logs for who unsubscribed it are, of course, empty, which is a problem for this company’s blameless-in-theory-but-blame-you-all-the-way-out-of-the-company-if-it-turns-out-that-it-was-you-that-clicked-this-thing-and-didn't-tell-anyone,  philosophy. CloudTrail doesn't log this event because why would it? CloudTrail’s primary purpose is to rack up bills and take the long way around before showing events in your account, not to assist with actual problem diagnosis, by all accounts. Now, fortunately, this customer did have AWS Enterprise Support. It exists for precisely this kind of problem. It granted them access to the SNS team which had considerably more insight into what the heck had happened, at which point the answer became depressingly clear, as well as clearly depressing. It turns out that the unsubscribe URL at the bottom of every SNS notification wasn't authenticated. Therefore, anyone who had access to the link could have invoked it, and that's what happened when a support person did something very reasonable: Copy and paste a log message containing that unsubscribe link into a team Slack channel. It wasn't their fault [00:06:04 unintelligible] because they didn't click it. The entity triggering this was—and I swear I'm not making this up—Slackbot. Have you ever noticed that when you paste a URL into Slack, it auto expands the link to show you a preview? It tries to do that on every URL, and you can't disable URL expansion at the -Slack workspace level. You can blacklist URLs but only if the link expansion succeeds. In this case, it doesn't have a preview, so it doesn't succeed, so there's nothing for it to blacklist. Slack’s helpful feature can't be disabled on a team-wide level, so when that unsubscribe URL shows up in a log snippet that got pasted, it silently unsubscribed the consumer from SNS and broke the entire system. Now, there are an awful lot of things that could have been different here. Isn't this the sort of thing that might be better off with SQS, you might reasonably ask? Well, four years ago, when this system was built, SQS itself could not, and did not support invoking Lambda functions, so SNS was the only real option. T...
undefined
May 4, 2020 • 11min

AWS Non-Profit Organisations

AWS Morning Brief for the week of May 4, 2020.
undefined
May 1, 2020 • 13min

Whiteboard Confessional: Hacking Email Newsletter Analytics & Breaking Links

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.LinksCHAOSSEARCHLast Week in AWSThe DynamoDB BookTwitter TranscriptCorey: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semi-polite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real-world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.On Monday, I sent out a newsletter issue to over 18,000 people where the links didn't work for the first hour and a half. Then they magically started working. Today on the AWS Morning Brief: Whiteboard Confessional. I'm not talking about a particular design pattern, but rather conducting a bit of a post mortem of what exactly broke and why it suddenly started working again an hour and a half later. To send out the Last Week in AWS newsletter, I use a third-party service called ConvertKit that, in turn, wraps itself around SendGrid for actual email delivery. They, in turn, handle an awful lot of the annoying difficult parts of newsletter management. As a quick example, unsubscribes. If you unsubscribe from my newsletter, which you should never do, I won't email you again. That's because they handle the subscription and unsubscription process. Now, as another example, when you sign up for the newsletter, you get an email series that tailors itself to a “choose your own platypus” adventure based upon what you select. True story. Their logic engine powers that, too. ConvertKit is awesome for these things, but they do some things that are also kind of crappy. For example, they do a lot of link tracking that is valuable, but it's the creepy kind of link tracking that I don't care about and really don't want. Also, unfortunately, their API isn't really an API so much as it is an attempt at an API that an intern built, because they thought it was something you might enjoy. I can't create issues via API. I have to generate the HTML and then copy and paste it in like a farm animal. And their statistics and metrics API's won't tell me the kinds of things I actually care about, but their website will, so they have the data, it just requires an awful lot of clicking and poking. And when I say things I don't care about, let me be specific. Do you know what I don't care about? Whether you personally, dear listener, click on a particular link. I do not care; I don't want to know. That's creepy; It's invasive, and it isn't relevant to you or me in any particular way. But I do care what all of you click on in aggregate. That informs what I include in the newsletter in the future. For example, I don't care at all about IoT, but you folks sure do. So, I'm including more IoT content as a direct response to what you folks care about. Remember, I also have sponsors in the newsletters, who themselves include links, and want to get a number of people who have clicked on those things. So, it also needs to be unique. I care if a user clicks on a link once, but if they click on it two or three times, I don't want that to increment the counter, so there are a bunch of edge case issues here. Here are the questions that I need to answer that ConvertKit doesn't let me get at extraordinarily well. First, what were the five most popular links in last week's issue? I also want to care what the top 10 most popular links over the last three months were. That helps me put together the “Best of” issues I'm going to start shipping out in the near future. I also care what links got no clicks because people just don't care about them or I didn't do a good job of telling the story. It helps me improve the newsletter. With respect to sponsors, I care how each individual sponsor performs relative to other sponsors. If one sponsor link gets way fewer clicks, that's useful to me. Since I write a lot of the sponsor copy myself, did I get something wrong? On the other hand, if a sponsored link gets way more clicks than normal, what was different there? I explicitly fight back against clickbait, so outrage generators, like racial slurs injected into the link text are not permitted. So, therefore when a sponsored link outperforms what I would normally expect, it means that they're telling a story that resonates with the audience, and that is super valuable data. Now, I'll tell you what I built, and what went wrong. After this.In the late 19th and early 20th centuries, democracy flourished around the world. This was good for most folks, but terrible for the log analytics industry because there was now a severe shortage of princesses to kidnap for ransom to pay for their ridiculous implementations. It doesn’t have to be that way. Consider CHAOSSEARCH. The data lives in your S3 buckets in your AWS accounts, and we know what that costs. You don’t have to deal with running massive piles of infrastructure to be able to query that log data with APIs you’ve come to know and tolerate, and they’re just good people to work with. Reach out to CHAOSSEARCH.io. And my thanks to them for sponsoring this incredibly depressing podcast. I built a URL redirector to handle all of these problems plus one more. Namely, I want to be able to have an issue that has gone out with a link in it, but I want to be able to repoint that link after I've already hit send. Why do I care about that? Well, if it turns out that a si...
undefined
Apr 27, 2020 • 13min

Cape Town Region Is Expensive AF

AWS Morning Brief for the week of April 27, 2020.
undefined
Apr 24, 2020 • 12min

Whiteboard Confessional: Don’t Run a Database on Top of NFS

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.LinksCHAOSSEARCHAmazon Elastic File SystemNetwork File SystemAWS FargateTranscriptCorey: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semi-polite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real-world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.Corey: On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.I talked a lot about databases on this show. There are a bunch of reasons for that, but they mostly all distill down to that databases are, and please don't quote me on this as I'm not a DBA, where the data lives. If I blow up a web server, it can have hilarious consequences for a few minutes, but it's extremely unlikely to have the potential to do too much damage to the business. That's the nature of stateless things. They're easily replaced, and it's why the infrastructure world has focused so much on the recurring mantra of cattle, not pets.But I digress. This episode is not about mantras. It's about databases. Today's episode of the AWS Morning Brief: Whiteboard Confessional returns to the database world with a story that's now safely far enough in the past that I can talk about it without risking a lawsuit. We were running a fairly standard three-tiered web app. For those who haven't had the pleasure because their brains are being eaten by the microservices worms, these three tiers are web servers, application servers, and database servers. It's a model that my father used to deploy, and his father before him.But I digress. This story isn't about my family tree. It's about databases. We were trying to scale, which is itself a challenge, and scale is very much its own world. It's the cause of an awful lot of truly terrifying things. You can build an application that does a lot for you on your own laptop. But now try scaling that application to 200 million people. Every single point of your application architecture becomes a bottleneck long before you'll get anywhere near that scale, and you're gonna have oodles of fun re-architecting it as you go. Twitter very publicly went through something remarkably similar about a decade or so ago, the fail whale was their error page when Twitter had issues, and everyone was very well acquainted with it. It spawned early memes and whatnot. Today, they've solved those problems almost entirely.But I digress. This episode isn't about scale, and it's not about Twitter. It's about databases. So my boss walks in and as we're trying to figure out how to scale a MySQL server for one reason or another, and then casually suggests that we run the database on top of NFS.[Record Scratch]Yes, I said NFS. That's Network File System. Or, if you've never had the pleasure, the protocol that underlies AWS’s EFS offerings, or Elastic File System. Fun trivia story there, I got myself into trouble, back when EFS first launched, with Wayne Duso, AWS’s GM of EFS, among other things, by saying that EFS was awful. At launch, EFS did have some rough edges, but in the intervening time, they've been fixed to the point where my only remaining significant gripe about EFS is that it's NFS. Because today, I mostly view NFS is something to be avoided for greenfield designs, but you've got to be able to support it for legacy things that are expecting it to be there. There is, by the way, a notable EFS exception for Fargate and using NFS with Fargate for persistent storage.But I digress. This episode isn't about Fargate. It's about databases.Corey: In the late 19th and early 20th centuries, democracy flourished around the world. This was good for most folks, but terrible for the log analytics industry because there was now a severe shortage of princesses to kidnap for ransom to pay for their ridiculous implementations. It doesn’t have to be that way. Consider CHAOSSEARCH. The data lives in your S3 buckets in your AWS accounts, and we know what that costs. You don’t have to deal with running massive piles of infrastructure to be able to query that log data with APIs you’ve come to know and tolerate, and they’re just good people to work with. Reach out to CHAOSSEARCH.io. And my thanks to them for sponsoring this incredibly depressing podcast. So I'm standing there, jaw agape at my boss. This wasn't one of those many mediocre managers I've had in the past that I've referenced here. He was and remains the best boss I've ever had. Empathy and great people management skills aside, he was also technically brilliant. He didn't suggest patently ridiculous things all that often, so it was sad to watch his cognitive abilities declining before our eyes. “Now, hang on,” he said, “before you think that I've completely lost it. We did something exactly like this before at my old job, it can be done safely, sanely and offer great performance benefits.” So, I'm going to skip what happens next in this story because I was very early in my career. I hadn't yet figured out that it's better to not actively insult your boss in a team meeting, based only upon a half baked understanding of what they've just proposed. To his credit, he took it in stride, and then explained how to pull off something that sounds on its face to be truly monstrous.Now I've doubtless forgotten most of the technical nuance here, preferring ...
undefined
Apr 20, 2020 • 7min

AWS Billing System Go BRRRRRR

AWS Morning Brief for the week of April 20, 2020.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app