Stripe processes payments for thousands of businesses. A single payment could involve 10 different networked services. If a payment fails, engineers need to be able to diagnose what happened. The root cause could lie in any of those services.

Distributed tracing is used to find the causes of failures and latency within networked services. In a distributed trace, each period of time associated with a request is recorded as a span. The spans can be connected together because they share a trace ID.

The spans of a distributed trace are one element of observability. Others include metrics and logs. Each of these components of observability makes its way into services like Lightstep and Datadog. The path traveled by different elements of observability is called the observability pipeline.

In an episode last year, Cory Watson explained how observability works at Stripe. In today’s episode, Cory describes how observability is created and aggregated. It’s a useful discussion for anyone working at a company that is figuring out how to instrument their systems for better monitoring.

The post Stripe Observability Pipeline with Cory Watson appeared first on Software Engineering Daily.