Cloud Engineering Archives - Software Engineering Daily cover image

Cloud Engineering Archives - Software Engineering Daily

Episodes about building and scaling large software projects

Latest episodes

Mar 16, 2017 • 44min

Stripe Infrastructure with Evan Broder

If you are building a service that processes payments, your software architecture has a lot of requirements. Not only do you need to be highly available, consistent, and fast–you need to be PCI compliant. In this episode, we explore the infrastructure of Stripe with Evan Broder, who has been with the company for five years. Stripe started as a small payments company catering to developers with a monolithic code base. Some of those aspects of Stripe have changed, and others have stayed the same. In our last episode, we covered how observability works at Stripe. In this episode, we explore what is being observed–the actual infrastructure itself, and how different engineers are organized around managing the infrastructure. In tomorrow’s episode, we’ll talk to Michael Manapat, about Stripe’s machine learning pipeline for detecting and preventing fraudulent transactions. Throughout these episodes, you will get a sense for how Stripe’s engineering culture works. We hope to do more experimental series like this in the future. Please give us feedback for what you think of the format by sending us email, joining the Slack group, or filling out our listener survey. All of these things are available on softwareengineeringdaily.com. The post Stripe Infrastructure with Evan Broder appeared first on Software Engineering Daily.

Mar 15, 2017 • 59min

Stripe Observability with Cory Watson

Observability allows engineers to understand what is going on inside their systems. In its most raw form, observability comes from log data. Modern systems have many layers of logs–virtualized cloud infrastructure, container orchestration, the container runtime itself, and the application logic running within the container. With all of these layers, it is not practical for a developer to have to sift through layers of logs every time a bug occurs in production, or a deployment fails integration tests. Higher level observability tools include charts, distributed tracing tools, and monitoring services. With proper observability, developers can save time during incident response. Day-to-day software development becomes safer and more comfortable. Stripe is a payments company for developers. This episode is the first in a series of episodes profiling different aspects of the company. Our guest Cory Watson leads the observability team at Stripe. In subsequent episodes, we will explore infrastructure and machine learning at Stripe. Throughout these episodes, you will get a sense for how Stripe’s engineering culture works. We hope to do more experimental series like this in the future. Please give us feedback for what you think of the format by sending us email, joining the Slack group, or filling out our listener survey. All of these things are available on softwareengineeringdaily.com. The post Stripe Observability with Cory Watson appeared first on Software Engineering Daily.

Mar 10, 2017 • 45min

Using CQRS to Make Controllers Lean with Derek Comartin

Command Query Responsibility Segregation (CQRS) is a powerful concept that has the potential to make for reliable and maintainable systems. It is also broadly misunderstood and means different things to different people. Derek Comartin learned about the idea after viewing some talks by Greg Young and has since successfully applied the approach with great success and it has transformed the way he views features, business requirements, and dependencies. The result is a system that is easier to maintain and faster to enhance. Among his key lessons are that slices are better than layers, mediators improve dependency management, and cohesion is better applied to business concerns than to technical ones. In this episode, Derek joins Dave Rael for a conversation about CQRS and applying it to make your code your own and to separate it from technical concerns in order to make your software development operation work better and faster. The post Using CQRS to Make Controllers Lean with Derek Comartin appeared first on Software Engineering Daily.

Mar 8, 2017 • 46min

Load Testing with Mark Gilbert

Load testing measures performance of a system undergoing a large volume of requests. Before an application is pushed to production, engineers will often load test their software to ensure it is resilient in the face of high traffic. As web applications have changed, the requirements around load testing have changed as well. External APIs, internal undocumented APIs, and proprietary databases are black boxes that you might not be able to test reliably with unit tests or integration tests. Having an end-to-end load testing system can provide a measure of insurance against unknown unknowns before users start engaging with a production version. Mark Gilbert works on performance engineering at Apica, and he joins the show to discuss how load testing software is built and when engineers should use it. Full disclosure: Apica is a sponsor of Software Engineering Daily. The post Load Testing with Mark Gilbert appeared first on Software Engineering Daily.

Mar 1, 2017 • 1h

Parse and Operations with Charity Majors

Parse was a backend as a service company built in 2011 before being acquired by Facebook in 2013. Building a backend as a service for developers requires walking a thin line between giving engineers lots of control and preventing those engineers from shooting themselves in the foot. While she was at Parse, Charity Majors learned about the operational burdens of managing a service with high uptime requirements and deeply technical edge cases that could take down a user’s entire system. Charity took the lessons in systems engineering that she learned at Parse and cofounded Honeycomb.io, a service for observability and monitoring. Honeycomb is described as a tool for your systems like an IDE is to your code. Parse was eventually shut down because the service did not have a place in the strategic plans of Facebook. Charity and I also discussed the lessons learned from how the Parse acquisition panned out–a useful conversation for anyone who is considering selling a company or acquiring a startup. The post Parse and Operations with Charity Majors appeared first on Software Engineering Daily.

Feb 28, 2017 • 55min

Heroku Autoscaling with Andrew Gwozdziewycz

When an application is using all of its available resources, that application needs to be scaled. Scaling an application means giving it more resources–typically servers. Autoscaling is an engineering practice where an application is automatically given more or less resources based on how healthy the application performance is at a given time. Applications on Heroku have access to autoscaling. Heroku users don’t need to worry about provisioning new servers manually because the platform does it for them. In this episode, we explore how Heroku built autoscaling. Andrew Gwozdziewycz [@apgwoz] is an operational experience engineer with Heroku. As he describes, autoscaling requires frequent health checks of an application. Since thousands of applications are running on Heroku, a metrics pipeline using Kafka and Cassandra supports the high volume of health check data. That data feeds into the decision process for when an application needs to scale. Full disclosure: Heroku is a sponsor of Software Engineering Daily. The post Heroku Autoscaling with Andrew Gwozdziewycz appeared first on Software Engineering Daily.

Feb 27, 2017 • 54min

Data Warehousing with Mark Rittman

In the mid 90s, data warehousing might have meant “using an Oracle database.” Today, it means a wide variety of things. You could be stitching together a big data pipeline using Kafka, Hadoop, and Spark. You could be using managed tools like BigQuery from Google. How did we get from the simple days of Oracle databases to the wealth of options available today? Mark Rittman writes and podcasts about data engineering and data warehousing on his site Drill to Detail. Today, we explore the past, present, and future of data warehousing and touch on many of the trends that have been explored in recent episodes of Software Engineering Daily. Show Notes Google BigQuery, and Why Big Data is About to Have Its Gmail Moment The post Data Warehousing with Mark Rittman appeared first on Software Engineering Daily.

Feb 14, 2017 • 51min

Service Proxying with Matt Klein

Most tech companies are moving toward a highly distributed microservices architecture. In this architecture, services are decoupled from each other and communicate with a common service language, often JSON over HTTP. This provides some standardization, but these companies are finding that more standardization would come in handy. At the ridesharing company Lyft, every internal service runs a tool called Envoy. Envoy is a service proxy. Whenever a service sends or receives a request, that request goes through Envoy before meeting its destination. Matt Klein started Envoy, and he joins the show to explain why it is useful to have this layer of standardization between services. He also gives some historical context for why Envoy was so helpful to Lyft. The post Service Proxying with Matt Klein appeared first on Software Engineering Daily.

Feb 13, 2017 • 45min

Infrastructure with Datanauts’ Chris Wahl and Ethan Banks

Infrastructure is a term that can mean many different things: your physical computer, the data center of your Amazon EC2 cluster, the virtualization layer, the container layer–on and on. In today’s episode, podcasters Chris Wahl and Ethan Banks discuss the past, present, and future of infrastructure with me. Ethan and Chris host Datanauts, a podcast about infrastructure. In each episode, Datanauts goes deep on a topic such as networking, serverless, or OpenStack. As someone who hosts a similar podcast, I find it entertaining and educational to hear their points of view on a regular basis. If you like Software Engineering Daily, you might like Datanauts. And if you like Datanauts, you will love this episode of Software Engineering Daily. The post Infrastructure with Datanauts’ Chris Wahl and Ethan Banks appeared first on Software Engineering Daily.

Feb 6, 2017 • 52min

Giphy Engineering with Anthony Johnson

Giphy is a search engine for gifs, the short animated graphics that we see around the Internet. Giphy is also a creative platform where people create new gifs. Every search engine requires the construction of a search index, which is a data structure that responds to search queries efficiently. Since Giphy is a search engine for graphics, there is almost no text inherently associated with the each document. Giphy uses a pipeline of different labeling techniques in order to make a gif indexable by the search engine. In my conversation with Giphy CTO Anthony Johnson, we discuss how to scale a search engine, why Giphy needs to build new techniques for image processing, how human labeling for machine learning is evolving, and the future of Giphy–both as a creative medium and an advertising platform. This was an exciting and wide-reaching interview. Show Notes The Art of Monitoring The post Giphy Engineering with Anthony Johnson appeared first on Software Engineering Daily.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner