
Cloud Engineering Archives - Software Engineering Daily
Episodes about building and scaling large software projects
Latest episodes

Jun 22, 2016 • 53min
Manufacturing and Microservices with Cimpress’ Jim Sokoloff and Maarten Wensveen
Mass customization is the process of making customized, personalized products that are accessible to individuals and small businesses. The process involves manufacturing, assembly lines, supply chains, and software at every step along the way. Today’s guests are Jim Sokoloff and Maarten Wensveen, who work on infrastructure and technology at Cimpress, a mass customization platform.
Cimpress has t shirt printers, warehousing machines, supply chain management tools, and lots of other computers that come together in the computer-integrated manufacturing process. The company has been around for a few decades, and more recently they have moved to microservices for many of the reasons that have been discussed in previous episodes. If you work at a big company with some monolithic characteristics, this episode might give you some good arguments to bring to your manager about why and how to move to microservices.
The post Manufacturing and Microservices with Cimpress’ Jim Sokoloff and Maarten Wensveen appeared first on Software Engineering Daily.

Jun 21, 2016 • 55min
Serverless Code with Ryan Scott Brown
The unit of computation has evolved from on premise servers to virtual machines in the cloud to containers running in those virtual machines. Serverless computation is another stage in the evolution of computational unit management. With a serverless architecture, a function call to the cloud spins up a transient container, calls the function on that container, and then spins down the container.
Ryan Scott Brown joins the show today to discuss the benefits and consequences of serverless computing. With containers and VMs, we still have to worry that the resources we are spinning up in the cloud will run without being utilized. Serverless computing gives us more control over these compute resources, so that we don’t have unused servers that we are paying for.
The post Serverless Code with Ryan Scott Brown appeared first on Software Engineering Daily.

Jun 15, 2016 • 54min
Google’s Site Reliability Engineering with Todd Underwood
Google’s site reliability engineers are responsible for maintaining the highly available services that power the Google software that we all use on a regular basis. O’Reilly recently published the book “Site Reliability Engineering: How Google Runs Production Systems”, and the book provides a comprehensive window into how the site reliability engineering role works.
Todd Underwood is a director of site reliability engineering. On today’s episode, Todd explains how the role of a SRE relates to devops. We discuss the relationship between the engineers who are developing Google services, and the SREs who are maintaining it. Google’s internal data center operating system “Borg” is also discussed.
The post Google’s Site Reliability Engineering with Todd Underwood appeared first on Software Engineering Daily.

May 18, 2016 • 51min
Dropbox’s Magic Pocket with James Cowling
Dropbox has been storing files on Amazon Web Services for 8 years, and Dropbox’s core business is storing files. For the past three years, Dropbox has been working on a project to migrate its file storage from Amazon Web Services to its own custom-built infrastructure. Magic Pocket is the name of Dropbox’s new infrastructure layer, and it gives Dropbox more control and improved economics.
James Cowling leads the storage team at Dropbox. In today’s episode, James takes us into the architecture of Dropbox and explains how the team moved all of the user file storage from Amazon S3 to Dropbox’s Magic Pocket infrastructure. Dropbox’s architecture is built with a focus on simplicity–and there are numerous challenges to maintaining that simplicity in the face of an extremely complex problem like this.
The post Dropbox’s Magic Pocket with James Cowling appeared first on Software Engineering Daily.

May 16, 2016 • 56min
Distributed Systems Tradeoffs with Camille Fournier
Distributed systems products are often marketed with terms like “real-time data” and “hassle-free scaling”, but what do those terms actually mean? Is data in a distributed system ever reliably “real time”? Do we ever have strong enough plans about our scalability strategy to say that scaling will be “hassle free”?
Camille Fournier joins us today to discuss distributed systems in practice. Like everything in else in computer science, distributed systems are all about tradeoffs–and picking the right sets of tradeoffs in our distributed system will affect the entire organization that is building that system.
We also discuss the Cloud Native Computing Foundation, which is similar to the Apache Foundation, but specifically for cloud technologies. The CNCF is likely to have strong impact on the way we build software for a long time to come.
The post Distributed Systems Tradeoffs with Camille Fournier appeared first on Software Engineering Daily.

Apr 29, 2016 • 36min
Distributed Systems and Exception Monitoring with Brian Rue
Exception monitoring services and log management services are two sides of a gradient. Exception monitoring services capture and aggregate the problems that occur on your application. Log management services aggregate all of your logs, so that you can decide for yourself what constitutes a problem.
Brian Rue from Rollbar joins the show today to talk about Rollbar’s exception monitoring architecture, and the competitive landscape of these technology products. Every software engineer wants track the problems with an application, but some developers need more information than others–and that ends up changing how these error aggregation services are architected. This is an interesting conversation on the business of SaaS products for developers, and the architecture of a distributed system designed to monitor and aggregate errors.
The post Distributed Systems and Exception Monitoring with Brian Rue appeared first on Software Engineering Daily.

Apr 20, 2016 • 45min
Google’s Container Management with Brendan Burns
Kubernetes is an open source system for automating deployment, operations, and scaling of containerized applications. Google developed Kubernetes after fifteen years of running containers in production.
Brendan Burns is a founder of the Kubernetes project, and he joins us to talk about the lessons learned as Google has built containerized applications to distribute across its massive infrastructure. We talk about Docker, Borg, Kubernetes, and other distributed systems technologies.
Applications crash and engineers need to be able to quickly find the root cause of a crash. Apps have become distributed, and debugging workflows have changed. Developers need better tools to identify and troubleshoot problems with their apps.
The post Google’s Container Management with Brendan Burns appeared first on Software Engineering Daily.

Apr 18, 2016 • 50min
Search as a Service with Julien Lemoine
“You need to build more things yourself to be highly available, but one of the very good consequences of being bare metal is that the prices are very low compared to what you could get on the cloud provider.”
Engineers who want to add search to their application usually deploy Elasticsearch, or write their own search engine that uses TF-IDF. These solutions work well for large documents, but are less effective for large volumes of small records–which is how many modern web and mobile applications are structured.
In today’s show, Julien Lemoine discusses how his company Algolia thinks about search. Algolia is a search as a service company that gives developers an easier way to search on their websites and applications.
Questions
What are the unsolved problems in search?
What is TF-IDF, and how is it is used to rank search results?
Why did you decide to set up your own servers instead of going to a cloud provider?
Why is it useful to divide search into two different aspects?
How do you stress-test your system?
How did you respond to your recent data center connectivity issue?
How do you build a company oriented for the long-term?
Links
Algolia
Elasticsearch
tf-idf
Bare-metal servers
Multitenancy model
GAE
Chaosmonkey testing strategy
ZooKeeper
Raft
On premise
Julien on Twitter
The post Search as a Service with Julien Lemoine appeared first on Software Engineering Daily.

Apr 15, 2016 • 40min
Managing a CDN with Carl Gustas
“We’re not always in control of other people’s networks.”
CDN stands for content delivery network. A content delivery network is a system of distributed servers that delivers web pages and other web content. Without CDNs, the internet would be much slower, because CDNs function as a caching layer for most web resources.
Carl Gustas is an engineer at CacheFly, a popular content delivery network. He joins us today to discuss how CDNs work, and the different methods an engineer can use to take advantage of caching in a CDN.
Questions
What is a proxy server, and how is it relevant to a CDN?
Why is a first hit to a website painful whereas subsequent hits are not?
Does a CDN provide resilience from DDOS attacks?
Do you use containers in production?
What is the replication factor for my files on a CDN?
What are the craziest stories you have from managing a CDN for 10 years?
Links
Ingress point (CDN overview)
CDN
Delivery edge
Cachefly
Carl on Twitter
The post Managing a CDN with Carl Gustas appeared first on Software Engineering Daily.

Apr 12, 2016 • 54min
Logging and NoOps with Christian Beedgen
“You write the code, but you don’t run it? That’s just preposterous.”
Software applications are constantly generating logs. These logs are necessary to understand how an application is functioning, and logs are key to debugging. As applications have gotten more complex, logging infrastructure has become complex as well. Storing and managing all of our log data is such a big task that several companies have been started to tackle this problem.
Today’s guest is Christian Beedgen, CTO at Sumo Logic. Sumo Logic is a cloud-based log management company. We discuss the elastic log-processing platform Sumo Logic has built to help software engineers with log management. It’s a great conversation about distributed systems, machine learning, and debugging applications.
Questions
How has logging changed for applications in the cloud?
What do you mean when you say “log data is big data”?
What kinds of elasticity are important in your log architecture?
What is the ingestion path for the logs?
Are there any interested distributed systems problems at Sumo Logic?
Why do you want to replicate three times?
How do you use machine learning to improve log management?
Can you describe what NoOps is, and how it reflects Sumo Logic’s culture?
Links
Sumo Logic
tail
Elastic log processing (video)
Multitenant architecture
Inverted index
Lucene
JSON logs
NoOps Debate Grows Heated
Christian on Twitter
The post Logging and NoOps with Christian Beedgen appeared first on Software Engineering Daily.