The Swyx Mixtape

Swyx

swyx's personal picks pod.

Weekdays: the best audio clips from podcasts I listen to, in 10 minutes or less!
Fridays: Music picks!
Weekends: long form talks and conversations!

This is a passion project; never any ads, 100% just recs from me to people who like the stuff I like.
Share and give feedback: tag @swyx on Twitter or email audio questions to swyx @ swyx.io

Episodes

Mentioned books

Sep 1, 2021 • 12min

Journey to TimeScaleDB [Mike Freedman]

Listen to the full interview on SEDaily: https://softwareengineeringdaily.com/2021/06/28/timescale-time-series-databases-with-mike-freedman/We originally created Timescale, really from our own need. Around thattime, 2014-2015, my co-founder and I, Ajay Kulkarni, who we go back many years, we resyncedup and we started thinking about it was a good time for both of us to think about what the nextchallenges are that we want to tackle. It seemed to us that there was this emerging trend ofnow, people talk about the digitization, or digital transformation. It feels like somewhat of ananalyst term, but I think, it's really responsive of what's happening, in that if you think about thelarge, big IT revolution, it was about changing the back office. What was used to be on paperwas now in computers.What we saw was somewhat the same thing happened to basically, every industry, from heavyindustry, to shipping, to logistics, to manufacturing, both discrete and continuous and home IoT.Sometimes this gets blurred under IoT, but we also think about it more broadly as operationaltechnology, those which are not necessarily bits, but atoms. A big part of that was actuallycollecting data of what those systems were doing. It's about sensors and data and whatnot.When we do Initially looked at this problem, we were thinking about a type of data platform wewould want to build, to make it easy to collect and store and analyze that type of data. I thinkthat's a way that we're slightly different, or why our – what we ultimately built as our databaseended up being fairly different than a lot of other so-called time series databases. That'sbecause many of them arose out of IT monitoring, where they were trying to collect metrics fromservers, where we were originally thinking about collecting data more broadly from all these typeof applications and devices around your world.When we started building it, it was originally focusing mostly on IoT. We quickly ran into thisproblem that the existing databases out there and the time series databases out there were notreally designed for our problems. They were often much more limited, because they werefocusing on this narrow infrastructure monitoring problem, where the data maybe wasn't asimportant. It was only a very specific type. Let's say, they stored only floats. They didn't have tohave extra metadata that they wanted to enrich their data to better understand what was goingon, like through joins.After, basically working on this platform for about a year, we somewhat came to the conclusionthat we actually need to build somewhat of our own time series database that was focusing onthis more broad type of problem, and so that's what we do. That's what led the development ofwhat became Timescale.JM: Today, what are the most common applications of a time series database?Like and speak mostly about obviously, TimescaleDB, rather than – as I wasalluding to before, a lot of the other time series databases are much more narrowly focused onIT monitoring, or observability. We really see our use cases across the field. We certainly seecases of observability. In fact, we have subsequently built actually a separate product on top of Timescale called Promp scale, that is really used for initially Prometheus metrics, but morebroadly, to make it easier to store observability data with TimescaleDB.We see still a lot of IoT. We see a lot of logistics. We see financial data and crypto data. We seeevent sourcing. We see product and user analytics. We see people collecting data about howusers are using their SaaS platforms. We see gaming analytics, where companies are collectinginformation about how people's virtual avatars are actually playing within the games. We seemusic analytics. We like to think of the old way, used to find the pop stars, you went down to thesmoky club. Now you collect SoundCloud and Spotify streams, and you use that to identify whothe next breakout artist is going to be.All of these are example of time series data. It's really what's so exciting to us as is it's such abroad use case, so horizontal, because basically, it's all about collecting data at the finestgranularity you can.Tell me about the initial architecture for TimescaleDB. You’re based off ofPostgresSQL. What was the reasoning around that decision?I think, as you point out, Timescale is actually implemented as an extension onPostgresSQL. Starting maybe 10 or 15 years ago, PostgresSQL started exposing low-levelhooks throughout its code base. This is not a plugin where you're running a little JavaScriptcode. We have function pointers into – we get function hooks into the C. PostgresSQL is writtenin C, and so TimescaleDB is, for the most part written in C. We have hooks throughout the codebase at the planner, at sometimes in the storage, at the execution nodes. We are able to insertourselves and do Lot of optimizations as part of the same process.You could ask the question of why not just implement a new database from scratch? Why buildit on top of PostgresSQL? I think this really gets to that, we always viewed ourselves as, and wehear this from our users and community all the time that we are – they are storing critical datainside TimescaleDB, and they need it to, A, work and be reliable. They also need it to be – theyhave a lot of use case requirements. It’s not this, again, narrow thing where you're collectingone metrics, and all you're asking to do is figure out the min-max average of a certain metric.You want to do fancy analysis. You want to do joins. You want to do sub queries. You want to docorrelations. You want to have views. You want the operational maturity of a database. You wanttransactions, backup, and restore, and all of the replication and all of the above. Some peoplesay, it takes maybe 10 years, at least, to build a reliable database. We thought this was a greatway in order to immediately gain that level of reliability, we ourselves are huge fans ofPostgresSQL. It has such a great community. It also has such a large ecosystem.The idea is that effectively, that entire ecosystem would work from us on day one. That means,all of the tooling, all of the ORMs, all of your libraries would just work. If we support full SQL, notSQL-ish. If you know how to use SQL, you could start using – and if your tools speak SQL, ifyou're running Tableau, if you're running Power BI if you're running Grafana, if you're runningSuperset, those all just start working on day one.Now, the second part of it is, well, what does that mean to build a time series database on top ofPostgresSQL, which clearly was designed more as a traditional transactional database, OLTPengine? Sometimes they talk about you think about this architecturally. What I mean by that isyou somewhat think about what your workloads look like and what that would mean from asoftware architecture. Maybe I'll give you a very concrete example. Starting maybe 10 or 15years ago, if you look at traditional databases, you started seeing the growth of what peoplecommonly now called as log structured merge trees, LSMs.This is a data structure that goes back to the mid-90s, but I think you first saw Google, JeffDean and Sanjay Ghemawat built something called LevelDB. The whole idea of an LSM treewas, if you look at a workload that has a lot of updates, so with a lot of e-commerceapplications, with a lot of social networks, you're constantly updating things. Traditionaldatabase, if you think about a disk...

Aug 31, 2021 • 10min

Journey to MongoDB [Mark Porter]

Listen to more on the StackOverflow Podcast: https://stackoverflow.blog/2021/08/06/podcast-364-mark-porter-mongodb-database/Transcriptmarkporter [00:00:00] swyx: This is Mark Porter, the CTO of Mongo DB on his personal journey from relational databases to Mongo DB. [00:00:06] Mark Porter: I am a relentless tech geek. I've loved tech my whole life. In fact, my Twitter handle is MarkLovesTech. I have used databases since I was 14 with some really ancient technologies started out on a 4k TRS 80 model one computer. We had to program it in assembly language because there wasn't enough memory to use the local basic copy. And I very quickly got into databases and I was talking to someone the other day and he pointed out something I'd never noticed, which is I've oscillated between using databases and building database. So I started out at Caltech and NASA using databases for space, data, and chip data. And then I built databases at Oracle versions, 5 6, 7, 8 for about 13 years. And then I used databases at NewsCorp for huge student data systems. And then I built databases at Amazon with Amazon RDS. Then I moved to Grab taxi, which is the Uber of Southeast Asia and use databases to deliver 15 million rides and meals a day, and then came back to Mongo DB. And here I am building databases again. I frankly can't get away from this thing. [00:01:20] Ben Popper: I love that story. I wonder. Does that mean. You know, at each point you had some sort of frustration or saw some sort of like opportunity for innovation, you know, you kind of would build something, then you'd be the user of it. Then you'd realize that like the next sort of turn of the wheel was coming. As you move between those jobs where new paradigms and databases and murders. [00:01:38] Mark Porter: Yeah. I mean, it's been really interesting. Half of my career. I've been the Bo and half my career. I've been the target. And I got to tell you that sometimes as a customer, you're not really happy being the target of what has been produced. Look, the reality is, is relational databases have been the modus operandi since 1970, when Cod first did his paper. And then Oracle was the first company that released them in 1979. They were actually known as relational technology back then and then changed their name later to Oracle. So the mission criticality of databases has never been in doubt. What has changed is the amount of data, the way we process that data. And what's really, really important. And it used to be duplication of data was important and things like that. And while that's still important, what's really important. Now is developer product. Bar none. That is job one for any mission critical software company is developer productivity and innovation [00:02:35] Ben Popper: makes a lot of sense. It does seem like data has become almost this, uh, overwhelming force for some companies. Ryan. I know if you have experience with this, but I've been getting a lot of pitches and, and talking with folks on the podcast and you know, it's gone from, we're using data to, we have data lakes and there's a data iceberg. And, you know, we're only sort of scratching the surface of what we might be able to do with this. Endless flow of unstructured data that we're collecting. And as you mentioned, yeah, a lot of times what they're looking to do is understand it in a way that allows them to enhance productivity or automate certain processes, which right now are very time labor intensive. Yeah. Yeah. At my previous job, I worked out on an article about data pipelines and, you know, ETL processes and that yeah. There's a becoming a separation, I think, between your production database and the database you use to gain insights, right? Then the production database has to be fast. But the insight database, it can be a little more flexible in how it produces data, right? [00:03:34] Mark Porter: Yeah. So we think about systems of record. We think about systems of insight and yeah. I mean, definitely different people want to do different things with the databases. And so what we do is we think about personas. Are you an analyst? Are you a developer? Are you an AI ML engineer? Are you a PhD data scientist? We always try to come at it from the customer and what they want to accomplish. Yeah, [00:03:56] Ben Popper: I think that's so interesting because as you said, obviously, databases have always been part of working in the world of software and computers, but increasingly there are these specialties that are very important in which are producing these really interesting results that themselves are devoted to data, as opposed to it being something that, you know, needs to be part of the larger process. Um, so mark, I wanted to touch on something, which is that you had a part of your career at AWS, which now, you know, has grown into. Quite a behemoth. Um, yeah. Just wondering if you can talk to us a little bit about what you learned there and maybe how some of that applies to the role you have at, at Mongo DB. [00:04:26] Mark Porter: Yeah. So I joined AWS as the general manager of AWS RDS, which at that time was probably the largest fleet of databases in the world. And that fleet grew just tremendously while I was there. It was, it was amazing, you know, just showing. That it's not just databases. It was managed databases that mattered. So RDS did not build any of its own databases, RDS vended. By the time I left over a million significantly more than a million Postgres, my SQL Maria DB, Oracle, and SQL server databases. And so the product that we produced was managing those databases and people love it when their database stays up. When the backups and restores work, when you can change parameters when fail over works and all those things. However, over time, as much as I loved running those databases, I became frustrated with how they were shackles almost on customer innovation and customer operability. And so we developed this system called Amazon Aurora, which changed out the storage system underneath Postgres in my SQL. Obviously we couldn't do that for the commercial databases and we made those databases so much more resilient, so much more durable, so much more available, but we kept running into the fundamental limit. Of a rigid architecture of high fail over times and a single primary architecture, which meant that the blast rate. Of a system going down or play in changing in Oracle database. I mean, it takes down a whole company and I can talk more about availability. In fact, you'll have trouble stopping. When you talk to you about availability, if you get me started [00:06:09] Ben Popper: well, I mean, that's, that's the, uh, the big thing about a no SQL is, is availability, right? The replicability, the speed of access. Yeah, for folks who don't know, let let's lay out the value prop here. Like what is sort of the difference between the two and why would you prefer one over the other? You know, you mentioned shackles. I love that word, but yeah. You know, what are the limitations that it allows you to avoid when you, when you move to a new SQL and I gue...

Aug 30, 2021 • 31min

[Weekend Drop] The True Story of Frank Abagnale

Watch on https://www.youtube.com/watch?v=iJIc16aqpO8&t=642s

Aug 28, 2021 • 7min

[Music Fridays] The Petersens

The Petersens: https://www.youtube.com/c/ThePetersens/aboutTake Me Home: https://www.youtube.com/watch?v=qap9Qm-Q894Jolene: https://www.youtube.com/watch?v=viQx4KDivPYLandslide: https://www.youtube.com/watch?v=joUwy8lpvP0

Aug 27, 2021 • 10min

The Origin of Braintree [Bryan Johnson]

Listen to his full interview on the Lex Fridman podcast: https://www.youtube.com/watch?v=1YbcB6b4A2UBryan's wiki: https://en.wikipedia.org/wiki/Bryan_Johnson_(entrepreneur)

Aug 26, 2021 • 16min

The Origin of Twilio [Jeff Lawson]

Listen to the Cloud Giants podcast: https://www.listennotes.com/podcasts/cloud-giants/jeff-lawson-co-founder-and-sFi-ad_etHQ/#

Aug 25, 2021 • 10min

The Origin of Waze [Noam Bardin]

Listen to the NFX podcast: https://www.nfx.com/post/the-insider-story-of-waze/ (28 mins in)

Aug 24, 2021 • 11min

The Origin of Kubernetes and Heptio [Joe Beda]

Listen to the OSS Startup Podcast for the full episode.References:- Heptio's acquisition in 2018 "It’s not clear how many customers Heptio worked with but they included large, tech-forward businesses like Yahoo Japan."

Aug 22, 2021 • 31min

[Weekend Drop] An Evening With Kevin Smith

Superman Lives: Part 1, Part 2 Tim Burton: https://www.youtube.com/watch?v=fKbAEmvZyKQ

Aug 21, 2021 • 9min

[Music Fridays] Walk Off The Earth

Wiki: https://en.wikipedia.org/wiki/Walk_off_the_Earth- 2011: Someone Like You (5 People 1 Guitar) https://www.youtube.com/watch?v=d9NF2edxy-M- 2018: Girls Like You https://www.youtube.com/watch?v=e3PcnNiWygw- 2020: A History of the Beatles https://www.youtube.com/watch?v=rfOx4CmQWLs

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner