Oxide and Friends cover image

Oxide and Friends

Latest episodes

undefined
Jul 11, 2023 • 1h 24min

Tales from Manufacturing: Shipping Rack 1

Bryan and Adam were joined by members of the Oxide operations team to discuss the logistics of actually assembling the first Oxide Rack, crating it, shipping it... and all the false starts, blind alleys, and failed tests along the way.We've been hosting a live show weekly on Mondays at 5p for about an hour, and recording them all; here is the recording from July 10th, 2023.In addition to Bryan Cantrill and Adam Leventhal, we were joined by Oxide colleagues, Kate Hicks, Kirstin Neira, CJ Mendez, Erik Anderson, Josh Clulow, Nathanael Huffman and Aaron Hartwig.If we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!
undefined
Jul 4, 2023 • 2h 3min

Shipping the first Oxide rack: Your questions answered!

On this week's show, Adam Leventhal posed questions from Hacker News (mostly) to Oxide founders Bryan Cantrill and Steve Tuck. Stick around until the end to hear about the hardest parts of building Oxide--great, surprising answers from both Bryan and Steve.They were also joined by Steve Klabnik.Questions for Steve and Bryan:[@6:38] Q:Congrats to the team, but after hearing about Oxide for literal years since the beginning of the company and repeatedly reading different iterations of their landing page, I still don't know what their product actually is. It's a hypervisor host? Maybe? So I can host VMs on it? And a network switch? So I can....switch stuff? (*)A:Steve: A rack-scale computer; "A product that allows the rest of the market that runs on-premises IT access to cloud computing."Bryan: agrees[@8:46] Q:It's like an on prem AWS for devs. I don't understand the use case but the hardware is cool. (*)I didn’t understand the business opportunity of Oxide at all. Didn’t make sense to me.However if they’re aiming at the companies parachuting out of the cloud back to data centers and on prem then it makes a lot of sense.It’s possible that the price comparison is not with comparable computing devices, but simply with the 9 cents per gigabyte egress fee from major clouds. (*)A:Bryan: "Elastic infrastructure is great and shouldn't be cloistered to the public cloud"; Good reasons to run on-prem: compliance, security, risk management, latency, economics; "Once you get to a certain size, it really makes sense to own"Steve: As more things move onto the internet, need for on-prem is going to grow; you should have the freedom to own[@13:31] Q:Somebody help me understand the business value. All the tech is cool but I don't get the business model, it seems deeply impractical.You buy your own servers instead of renting, which is what most people are doing now. They argue there's a case for this, but it seems like a shrinking market. Everything has gone cloud.Even if there are lots of people who want to leave the cloud, all their data is there. That's how they get you -- it costs nothing to bring data in and a lot to transfer it out. So high cost to switch.AWS and others provide tons of other services in their clouds, which if you depend on you'll have to build out on top of Oxide. So even higher cost to switch.Even though you bought your own servers, you still have to run everything inside VMs, which introduce the sort of issues you would hope to avoid by buying your own servers! Why is this? Because they're building everything on Illumos (Solaris) which is for all practical purposes is dead outside Oxide and delivering questionable value here.Based on blogs/twitter/mastodon they have put a lot of effort into perfecting these weird EE side quests, but they're not making real new hardware (no new CPU, no new fabric, etc). I am skeptical any customers will notice or care and would have not noticed had they used off the shelf hardware/power setups.So you have to be this ultra-bizarre customer, somebody who wants their own servers, but doesn't mind VMs, doesn't need to migrate out of the cloud but wants this instead of whatever hardware they manage themselves now, who will buy a rack at a time, who doesn't need any custom hardware, and is willing to put up with whatever off-the-beaten path difficulties are going to occur because of the custom stuff they've done that's AFAICT is very low value for the customer. Who is this? Even the poster child for needing on prem, the CIA is on AWS now.I don't get it, it just seems like a bunch of geeks playing with VC money?(*)A:Bryan: "EE side quests" rant; you can't build robust, elastic infrastructure on commodity hardware at scale; "The minimum viable product is really, really big"; Example: monitoring fan power draw, tweaking reference desgins doesn't cut it Example: eliminating redundant AC power suppliesSteve: "Feels like I’m dealing with my divorced parents" post[@32:24] Q (Chat):It would be nice to see what this thing is like before having to write a big checkSteve: We are striving to have lab infrastructure available for test drives[@32:56] Q (Chat):I want to know about shipping insurance, logistics, who does the install, ...Bryan: "Next week we'll be joined by the operations team" we want to have an indepth conversation about those topics[@34:40] Q:Seems like Oxide is aiming to be the Apple of the enterprise hardware (which isn't too surprising given the background of the people involved - Sun used to be something like that as were other fully-integrated providers, though granted that Sun didn't write Unix from scratch). Almost like coming to a full circle from the days where the hardware and the software was all done in an integrated fashion before Linux turned-up and started to run on your toaster. (*)A:Bryan: We find things to emulate in both Apple and Sun, e.g., integrated hard- and software; AS/400Steve: "It's not hardware and software together for integration sake", it's required to deliver what the customer wants; "You can't control that experience when you only do half the equation"[@42:38] Q:I truly and honestly hope you succeed. I know for certain that the market for on-prem will remain large for certain sectors for the forseeable future. However. The kind of customer who spends this type of money can be conservative. They already have to go with on an unknown vendor, and rely on unknown hardware. Then they end up with a hypervisor virtually no one else in the same market segment uses.Would you say that KVM or ESXi would be an easier or harder sell here?Innovation budget can be a useful concept. And I'm afraid it's being stretched a lot. (*)A:Bryan: We can deliver more value with our own hypervisor; we've had a lot of experience in that domain from Joyent. There are a lot of reasons that VMware et al. are not popular with their own customers; Intel vs. AMDSteve: "We think it's super important that we're very transparent with what we're building"[@56:05] Q:what is the interface I get when I turn this $$$ computer on? What is the zero to first value when I buy this hardware? (*)A:Steve: "You roll the rack in, you have to give it power, and you have give it networking [...] and you are then off on starting the software experience"; Large pool of infrastructure reosources for customers/devs/SREs/... in a day or less; Similar experience to public cloud providers[@01:02:06] Q:One of my concerns when buying a complete so...
undefined
Jun 27, 2023 • 1h 11min

Okay, Doomer: A Rebuttal to AI Doom-mongering

Bryan and Adam offer a rebuttal to the AI doomerism that has been gaining volume. And--hoo-boy--this one had some range. Heaven’s Gate, ceteris paribus, WWII, derpy security robots, press-fit DIMM sockets, async Rust, etc. And optimistic as always: the hardware and systems AI doomers imagine are incredibly hard to get right; let’s see AIs help us before we worry about our own obsolescence!On this episode Bryan Cantrill and Adam Leventhal were on a rant; but we welcome others on-stage!Some of the topics we hit on, in the order that we hit them:How we got here: Tweet from Liron ShapiraComet Hale-BoppHeaven's GateCross price supply elasticity of copper and molybdenum marketsCeteris paribus -- Bryan's exit from economicsChris Dixon's book releasing in March 2024 (NOT AN ENDORSEMENT)British to American translation guide"It's not just human-level extinction... it's like potential destruction of all value in the light cone" - Emmett ShearVingian SingularityOxide and Friends: Tales from the bringup labOxide and Friends: More tales from the bringup labBullying self-driving carsAI Resistance Reservists: "For the Lightcone!"Samsung security robotsOxide and Friends: Does a GPT future need software engineers?"I for one welcome our new AI overlords"If we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!
undefined
Jun 20, 2023 • 1h 22min

Software Verificationpalooza

Greg and Rain from the Oxide team joined Bryan and Adam to talk about powerful methods of verifying software: formal methods in the form of TLA+ and property-based testing in the form of the proptest Rust crate. If you care about making software right, don't miss it!In addition to Bryan Cantrill and Adam Leventhal, we were joined by Oxide colleagues Greg Colombo and Rain Paharia.Some of the topics we hit on, in the order that we hit them:Distributed SagasSteno -- Oxide's implementation of distributed sagasLearn TLA+Hillel Wayne talksHillel Wayne on Alloy 6Quickcheck Paper (2000)Proptest docsRain's example codeuse proptest::prelude::*; use proptest::collection::vec; proptest! { #[test] fn proptest_my_sort_pairs(input in vec(any::<u64>(), 0..128)) { let output = my_sort(input); for window in output.windows(2) { assert!(window[0] <= window[1]); } } #[test] fn proptest_my_sort_against_bubble_sort(input in vec(any::<u64>(), 0..128)) { let output = my_sort(input.clone()); let bubble_output = bubble_sort(input); assert_eq!(output, bubble_output); } // These proptests implicitly check that my_sort doesn't crash. }buf-list crateguppy crate... and stay tuned for an upcoming episode revisiting async/await in RustIf we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!
undefined
Jun 13, 2023 • 1h 6min

Virtualizing Time

Jordan Hendricks joined Bryan and Adam to talk about her work virtualizing time--particularly challenging when migrating virtual machines from one physical machine to another!We've been hosting a live show weekly on Mondays at 5p for about an hour, and recording them all; here is the recording from June 12th, 2023.In addition to Bryan Cantrill and Adam Leventhal, we were joined by Oxide colleague Jordan Hendricks.The (lightly edited) live chat from the show:DanCrossNYC: The TSC ticks at a fixed rate now days, regardless of voltage scaling on the CPU.jbk: just x86 doesn't provide a consistent want to determine what the rate isjbk: (I guess some chips will tell you via CPUID, but I've yet to actually encounter such chips)jbk: some hypervisors will tell you via an MSRzorg24: Looks the Linux kernel docs have some documentation on the x86 TSC and PIT https://www.kernel.org/doc/html/next/virt/kvm/x86/timekeeping.htmlDanCrossNYC: CPUID or an MSR, but yeah, most systems sample over a fixed interval (determined by another time source) to figure it out.jbk: no, versus some other present component that allows you to measure the frequencyDanCrossNYC: No, the PIT or HPET or something.jbk: https://src.illumos.org/source/xref/illumos-gate/usr/src/uts/i86pc/os/tscc_pit.c?r=236cb9a8jbk: is how it uses the PITjbk: (the HPET code needs to improve it's accuracy, so it's only used when the PIT isn't there at the moment)jbk: some Intel NUCs have no PITjbk: so HPET is the only optionbcantrill: https://github.com/illumos/illumos-gate/commit/717646f7112314de3f464bc0b75f034f009c861eDanCrossNYC: Two big ones: system maintenance without disturbing guest workloads, and also load balancing across a rack."Sevan: ah, thanks.https://github.com/illumos/illumos-gate/blob/717646f7112314de3f464bc0b75f034f009c861e/usr/src/test/bhyve-tests/tests/common/common.c#L166"bcantrill: https://github.com/oxidecomputer/tsc-simulator/tree/masterDanCrossNYC: The guest may well be running NTP itself.iangrunert: I assume you could also check that NTP is alive / has synced recently before doing a migration right?aka_pugs: Do people use IEEE 1588/PTP in datacenters? Maybe finance wackos?zorg24: also it might be tricky to check if NTP synced recently if it is happening in usermodeiangrunert: Might've missed this - is it just the hypervisor that has to run NTP recently or the VM as well?saone: I believe it was just the hypervisorDanCrossNYC: The host.DanCrossNYC: A guest may or may not; that's up to the guest.jbk: but IIUC, if the guest IS running NTP, then the host definitely needs it to avoid any time warpsDanCrossNYC: Yup.DanCrossNYC: Fortunately, there's a bit of an out for the blackout window during migration: SMM mode can effectively pause a machine for an indefinite period of time.DanCrossNYC: We don't USE SMM anywhere, but robust systems software kinda needs to handle the case where the machine goes out to lunch for a minute.zorg24: 🙌 hooray for hardware with no SMM useDanCrossNYC: We have done everything we can to turn it off.ahl: https://github.com/dtolnay/case-studies/blob/master/autoref-specialization/README.mdahl: https://github.com/oxidecomputer/propolisearltea: it worked so well I almost thought the VM didn't migrate 😅saone: It's easy to forget that there's a world outside the cloud, but edge deployments that have physical peripherals hooked up need to maintain those connections to peripherals; migrating those peripherals to cloud environments and managing that integration has been a big challenge for my group.iangrunert: https://signalsandthreads.com/clock-synchronization/ Good listen about clock synchronization and PTP in the ""finance weirdos"" world. MiFID 2 time sync requirements require timestamping key trading event records to within 100 microseconds of UTC.jhendricks: a bit belated, but the propolis side of these changes: https://github.com/oxidecomputer/propolis/commit/7ed480843d3b5cfd9fd07dce41772f8eac4e9171saethlin: The calvalry??saethlin: Are we just going to let that slidesaethlin: Is this a pronunciation situation againzorg24: not the first time I've heard it pronounced that way 🤷saethlin: Well maybe it's me learning this timeDanCrossNYC: CalvaryDanCrossNYC: That's the religious thing.ahl: https://github.com/illumos/illumos-gate/blob/0c5967db436935325af441af2b27d337f4e64af5/usr/src/uts/common/os/cyclic.c#L44zooooooooo: thought this was rust typescript at first 😳DanCrossNYC: Dunno... I missed it. 🙂ahl: * Starting in about 1994, chip architectures began specifying high resolution * timestamp registers. As of this writing (1999), all major chip families * (UltraSPARC, PentiumPro, MIPS, PowerPC, Alpha) have high resolution * timestamp registers, and two (UltraSPARC and MIPS) have added the capacity * to interrupt based on timestamp values. These timestamp-compare registers * present a time-based interrupt source which can be reprogrammed arbitrarily * often without introducing error. Given the low cost of implementing such a * timestamp-compare register (and the tangible benefit of eliminating * discrete timer parts), it is reasonable to expect that future chip * architectures will adopt this feature. aka_pugs: Bryan's TSC is overflowing.DanCrossNYC: That's Tom.DanCrossNYC: Riding in with the cavalry.aka_pugs: Good session.ahl: Thanks...
undefined
May 30, 2023 • 1h 26min

Open Source Governance

Bryan and Adam are joined by Ashley Williams to talk about open source governance... and the recently, and various stumblings of the Rust project leadership.
undefined
May 16, 2023 • 1h 20min

Building Together: Oxide and Samtec

Bryan and Adam are joined by Jonathan and Jignesh from Samtec to discuss working together to build the Oxide Rack. We've all seen bad vendors--what does it mean to be a great partner? Also: silicon photonics are (still!) just 18 months away!
undefined
May 9, 2023 • 1h 39min

The Network Behind the Network

Bryan and Adam are joined by Oxide colleagues Arjen, Matt, John, and Nathaneal to talk about the management network--the brainstem of the Oxide Rack. Just as it ties together so many components, this episode ties together many many (many!) topics we've discussed in other episodes.We've been hosting a live show weekly on Mondays at 5p for about an hour, and recording them all; here is the recording from May 8th 2023.In addition to Bryan Cantrill and Adam Leventhal, we were joined by Oxide colleagues Arjen Roodselaar, Matt Keeter, John Gallagher, and Nathanael Huffman.This built on work described in many previous episodes:Cabling the Backplane Prior to going all-in on a cabled backplane with blind-mated server sleds (i.e. no plugging, unplugging, mis-plugging network cables). We (Bryan) espoused an "NC-SI or bust" mantra... at least in part to avoid doubling the cable count. With the cabled backplane, the reasons for NC-SI disappeared (which let the many reasons against truly shine).The Pragmatism of Hubris in which we talk about our embedded operating system, Hubris (and it's companion debugger, Humility). Hubris runs on the service processors that are the main endpoints on the management network. Matt's work controlling the management network switch (the VSC7448) is in the context of Hubris, as is John's work communicating with the sleds over the management network.The Power of Proto Boards showed and told about the many small boards we've used in development. Several of those were purpose built for controlling and simulating parts of the management network.The Oxide Supply Chain Kate Hicks joined us to talk about the challenges of navigating the supply chain. Mentioned here in the context of "supply-chain-driven design": we designed around the parts we could procure! Tip: stay away from "automotive-quality" parts when the auto industry is soaking them all up.Holistic Boot in which we talked about how (uniquely!) Oxide boots from nothing to its operating system and services. Over the management network, we can drive server recovery by piping in a RAMdisk over the network and then (slowly) through the UART to the CPU.Get You a State Machine for Great Good Andrew joined us to talk about his work on a state-machine driven text-UI and its companion replay debugger. We mentioned this in the context of John replaying the long upload process in seconds rather than hours to fix a UI bug.Major components of the management networkMatt's VSC7448 dev kitMatt's remote tuning setup via webcamManagement network debuggingManagement network debugging
undefined
May 2, 2023 • 1h 41min

Blue Skies Over Mastodon (with Erin Kissane and Tim Bray)

Erin Kissane joins Bryan and Adam to talk the new social network "Bluesky" through the lens of her blog post "Blue Skies Over Mastodon". Long-time friends of Oxide and social-media aficionados Time Bray and Steve Klabnik also helped shed light on technical and social aspects of the net network.Blue Skies Over Mastodon (with Erin Kissane and Tim Bray)We've been hosting a live show weekly on Mondays at 5p for about an hour, and recording them all; here is the recording from May 1st, 2023.In addition to Bryan Cantrill and Adam Leventhal, we were joined by special guest Erin Kissane and long-time acquaintances of the show Tim Bray and Steve Klabnik.Some of the topics we hit on, in the order that we hit them:Erin's blog post Blue Skies Over MastodonMastodon blog (5/1) A new onboarding experience on Mastodon]Tim's blog post from November Bye Twitter"Buy the rumor, sell the news"Hellthread"Skeet" is to "Tweet" is to "Toot" (aka "Publish")skyline.gayBluesky blog Composable ModerationLobstersPhanpySo you've been publically shamed by Jon RonsonIf we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!
undefined
Apr 18, 2023 • 1h 22min

Rust Trademark: Argle-bargle or Foofaraw?

The Rust Foundation caused a fracas with their proposed new trademark rules. Bryan and Adam were lucky enough to be joined by Ashley Williams, Adam Jacob, and Steve Klabnik for an insightful discussion of open source governance and communities--in particular as applied to Rust.Rust Trademark: Argle-bargle or Foofaraw?We've been hosting a live show weekly on Mondays at 5p for about an hour, and recording them all; here is the recording from April 17th, 2023.In addition to Bryan Cantrill and Adam Leventhal, we were joined by Ashley Williams, Adam Jacob, and Steve Klabnik.Some of the topics we hit on, in the order that we hit them:SuccessionThe Simpsons (explaining the title of this episode)The WireThe Wire at 20 PodcastThe Register: Rust Foundation Apologizes for Trademark PolicyJomboy (our aspiration)Ice WeaselPamela ChestekBryan's talk from Node Summit 2017: Platform as a Reflection of ValuesLinux Foundation form 990Rust Foundation BoardRust Foundation participation rulesIf we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode