Episode 52 - Why Oxide rebuilt the rack from scratch
Apr 4, 2024
auto_awesome
Bryan Cantrill, CTO of Oxide Computer, explains why the industry has been using the wrong components in data centers. The cloud's success is due to its easily consumed virtual servers, but it comes with costs. Cantrill discusses the need for an elastic infrastructure on-premise and why his team had to rebuild almost everything to deliver it.
Focus on providing elastic infrastructure ownership over rental-only models for on-premise facilities.
Emphasis on co-designing hardware with software to control system software for enhanced performance.
Innovative hardware design principles to simplify components and optimize system performance for efficient computing experience.
Deep dives
Oxide's Approach to Elastic Infrastructure
Oxide's main focus is on providing elastic infrastructure by offering true elastic infrastructure ownership instead of rental-only models. The company believes that cloud computing, especially elastic computing, is the future of all computing, and they aim to provide an alternative to deploying on big public cloud providers like AWS or Azure. Oxide emphasizes the significance of owning elastic infrastructure for reasons such as compliance, risk management, latency requirements, and long-term cost-effectiveness.
Software Co-Design and System Control
Oxide's engineering approach involves co-designing hardware with software, particularly focusing on controlling the lowest level system software. By developing their own board designs, switches, and system software, Oxide aims to address challenges associated with traditional firmware constructs like BMC, IPMI, Redfish, and BIOS upgrades. Their experience at previous companies led them to prioritize system software control and engineering to enhance hardware performance and minimize deployment difficulties.
Innovations in Hardware Design and System Integration
Oxide's innovative hardware design principles aim to simplify and optimize system components, such as power supplies and networking elements, to enhance overall system performance. By eliminating noisy fans, reducing power supply complexities, and implementing blind mating networking for uniform system management, Oxide's racks offer a silent and efficient computing experience. The company's meticulous engineering considerations, like pin and connector design and machine model development, underscore their commitment to delivering streamlined and effective computing solutions.
The Shift from Hardware Virtual Machine to Container Services
Moving from a robust hardware virtualization layer, the challenge arises when offering container services due to the less defined OS interface abstraction. Customers prefer deploying their container orchestration tools like Kubernetes or Terraform rather than using bespoke container services like ECS to optimize their AWS bills by utilizing mainstream primitives like EC2, EBS, and S3.
Challenges and Economics of Public Cloud Adoption
While public cloud services offer transformative experiences, concerns around economics and infrastructure control drive businesses towards on-premises solutions. Factors such as security, risk management, and cost-effectiveness contribute to the growing trend of companies evaluating ownership of their infrastructure to achieve better performance, longevity, and cost optimization amidst evolving hardware capabilities.
Oxide Computer has been rebuilding the rack. In this podcast, CTO Bryan Cantrill tells us why.
The data center industry has been building its own infrastructure for years, with the wrong components.
Servers weren't designed to be operated in data centers, and the 1U rack unit is the wrong size, because of simple science. Part of the success of the cloud is that it takes that integration away, and gives users an easily consumed set of virtual servers and elastic infrastructure. But it costs, and it has pushed users to renting something they would be better off owning. That's why we heard of the "cloud diaspora" - organizations people bringing their IT back from the cloud.
But what people need, Cantrill says, is an elastic infrastructure for the on-premise facility. In this podcast, you can hear him explaining why his team found they had to rebuild almost everything to deliver it.
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode