Cloud Engineering Archives - Software Engineering Daily cover image

Cloud Engineering Archives - Software Engineering Daily

Latest episodes

undefined
Jan 8, 2019 • 1h 6min

Multicloud with Ben Hindman

Most applications today are either deployed to on-premise environments or deployed to a single cloud provider. Developers who are deploying on-prem struggle to set up complicated open source tools like Kafka and Hadoop. Developers who are deploying to a cloud provider tend to stay within that specific cloud provider, because moving between different clouds and integrating services across clouds adds complexity. Ben Hindman started the Apache Mesos project when he was working in the Berkeley AMPLab. Mesos is a scheduler for resources in a distributed system, allowing compute and storage to be scheduled onto jobs that can use those resources. In his time at the AMPLab, Ben collaborated with Matei Zaharia, creator of Apache Spark. Ben founded Mesosphere based off of his work on Apache Mesos, and since 2013 he has been building a company to bring it to market. In the meantime, several market forces have influenced the enterprise market. Enterprise businesses built on virtual machines and on-prem hardware are trying to migrate to containers, Kubernetes, and Spark. Cloud providers like Google and Microsoft have risen to prominence in addition to Amazon’s continued growth, and enterprises are increasingly willing to adopt multiple clouds. I spoke with Ben Hindman at Kubecon North America. Today, the company that he co-founded works to provide tools for managing these changes in infrastructure. In our conversation, we talked about the necessary mindset shifts for taking a research project and turning it into a highly successful product. We also talked about the newer trends in infrastructure–why enterprises will want multicloud deployments and how serverless APIs and backends will make the lives of developers much easier. The post Multicloud with Ben Hindman appeared first on Software Engineering Daily.
undefined
Jan 7, 2019 • 54min

Stateful Kubernetes with Saad Ali

In a cloud infrastructure environment, failures happen regularly. The servers can fail, the network can fail, and software bugs can crash your software unexpectedly. The amount of failures that can occur in cloud infrastructure is one reason why storage is often separated from application logic. A developer can launch multiple instances of their application, with each instance providing a “stateless” environment for serving API requests. When the application needs to save state, it can make a call out to a managed cloud infrastructure product. Managed cloud databases provide a reliable place to manage application state. Managed object storage systems like Amazon S3 provide a reliable place to store files. The pattern of relying on remote cloud services does not work so well for on-prem and hybrid cloud environments. In these environments, companies are managing their own data centers and their own storage devices. As companies with on-prem infrastructure adopt Kubernetes, there is a need for ways to manage on-prem storage through Kubernetes. Saad Ali is a senior engineer at Google, where he works on Kubernetes. He is also a part of the Kubernetes Storage Special Interest Group. Saad joins the show talk about how Kubernetes interacts with storage, and how to manage stateful workloads on Kubernetes. We discuss the basics of Kubernetes storage, including persistent volumes and the container storage interface. The post Stateful Kubernetes with Saad Ali appeared first on Software Engineering Daily.
undefined
Jan 2, 2019 • 51min

Crossplane: Multicloud Control Plane with Bassam Tabbara

Cloud providers created the ability for developers to easily deploy their applications to servers on data centers. In the early days of the cloud, most of the code that a developer wrote for their application could run on any cloud provider, whether it was Amazon, Google, or Microsoft. These cloud providers were giving developers the same Linux server that they would expect from an on-premise deployment. Early cloud applications such as Netflix, Airbnb, and Uber took advantage of this cloud infrastructure to quickly scale their businesses. In the process, these companies had to figure out how to manage open source distributed systems tools such as Hadoop and Kafka. Cloud servers were easy to create, but orchestrating them together to build distributed systems was still very hard. As the cloud providers matured, they developed higher level systems that solved many of the painful infrastructure problems. Managed databases, autoscaling queueing systems, machine learning APIs, and hundreds of other tools. Examples include Amazon Kinesis and Google BigQuery. These tools are invaluable because they allow a developer to quickly build applications on top of durable, resilient cloud infrastructure. With all of these managed services, developers are spending less time on infrastructure and more time on business logic. But managed services also lead to a new infrastructure problem—how do you manage resources across multiple clouds? A bucket storage system like Amazon S3 has different APIs than Google Cloud Storage. Google Cloud PubSub has different APIs than Amazon Kinesis. Since different clouds have different APIs, developers have trouble connecting cloud resources together, and it has become difficult to migrate your entire application from one cloud provider to another. Crossplane is an open source control plane for managing resources across multiple clouds. Crossplane’s goal is to provide a single API surface for interfacing with all the parts of your application, regardless of what cloud they are on. Crossplane is a project that was started by Upbound, a company with the goal of making multicloud software development easier. Bassam Tabbara is the CEO of Upbound, and he joins the show to talk about multi cloud deployments, Kubernetes federation, and his strategy for building a multi cloud API. The post Crossplane: Multicloud Control Plane with Bassam Tabbara appeared first on Software Engineering Daily.
undefined
Dec 25, 2018 • 1h 5min

Google Early Days with John Looney Holiday Repeat

Originally posted on 16 June 2017. John Looney spent more than 10 years at Google. He started with infrastructure, and was part of the team that migrated Google File System to Colossus, the successor to GFS. Imagine migrating every piece of data on Google from one distributed file system to another. In this episode, John sheds light on the engineering culture that has made Google so successful. He has very entertaining stories about clusterops and site-reliability engineering. Google’s success in engineering is due to extremely high standards, and a culture of intellectual honesty. With the volume of data and throughput that Google responds to, 1-in-a-million events are likely to occur. There isn’t room for sloppy practices. John now works at Intercom, where he is adjusting to the modern world of Google infrastructure for everyone. This conversation made me feel quite grateful to be an engineer in a time where everything is so much cheaper, so much easier, and so much more performant than it was in the days when Google first built everything from scratch. I had a great time talking to John, and hope he comes back on the show again in the future because it felt like we were just scratching the surface of his experience. The post Google Early Days with John Looney Holiday Repeat appeared first on Software Engineering Daily.
undefined
Dec 24, 2018 • 51min

Service Proxying with Matt Klein Holiday Repeat

Originally posted on 14 February 2017. Most tech companies are moving toward a highly distributed microservices architecture. In this architecture, services are decoupled from each other and communicate with a common service language, often JSON over HTTP. This provides some standardization, but these companies are finding that more standardization would come in handy. At the ridesharing company Lyft, every internal service runs a tool called Envoy. Envoy is a service proxy. Whenever a service sends or receives a request, that request goes through Envoy before meeting its destination. Matt Klein started Envoy, and he joins the show to explain why it is useful to have this layer of standardization between services. He also gives some historical context for why Envoy was so helpful to Lyft. The post Service Proxying with Matt Klein Holiday Repeat appeared first on Software Engineering Daily.
undefined
Dec 19, 2018 • 52min

Linkerd Service Mesh with William Morgan

Software products are distributed across more and more servers as they grow. With the proliferation of cloud providers like AWS, these large infrastructure deployments have become much easier to create. With the maturity of Kubernetes, these distributed applications are more reliable. Developers and operators can use a service mesh to manage the interactions between services across this distributed application. A service mesh is a layer across a distributed microservices application that consists of service proxy sidecars running alongside each service in a cluster, along with a central control plane for communicating with those sidecar proxies. A service mesh has many uses. Every request and response within the application gets routed through the service proxy, which can improve observability, traffic control to different instances, and circuit breaking in case of an instance failure. The central control plane can be used manage network policy throughout the whole system. We have done shows about each of the different components of a service mesh system, including different types of service proxies, as well as the service meshes built on top of these proxies. Linkerd, which is made by the startup Buoyant, was the first service mesh product to come to market, and it has the most production use, with customers like Expedia and Monzo bank. Istio is a more recent service mesh which uses the Envoy service proxy. Istio came out of Google and is also supported by IBM—setting up a classic competition between a startup and the large incumbents. William Morgan is the CEO of Buoyant, and he joins the show to talk about the use cases and adoption of service mesh. He also talks about the business landscape of the service mesh category, and how to compete with giant cloud providers. The post Linkerd Service Mesh with William Morgan appeared first on Software Engineering Daily.
undefined
Dec 3, 2018 • 56min

On-Prem Cloud with Bob Fraser

Not every company wants to move to the public cloud. Some companies have already built data centers, and can continue to operate their business with their own servers. Some companies have compliance issues with the public cloud, and want to operate their own servers to avoid legal risk. Operating a data center is not easy. Operating systems need to be updated and security vulnerabilities need to be patched. Servers fail, and their workloads need to be automatically scheduled onto other servers to avoid downtime. In contrast to classic on-prem data center management, the cloud provides many benefits: automatic updates, an infinite pool of resources, fully programmable infrastructure as code. In the cloud, developers can provision infrastructure with an API request. Continuous delivery pipelines can be spun up at the click of a button. This tooling makes it dramatically easier for developers to move quickly, and for a business to move faster. Companies that operate their own data center want to be able to have these benefits of the cloud while still controlling their own infrastructure. Today’s guest Bob Fraser works at HPE on OneView, a tool for managing on-prem infrastructure like a cloud. Bob describes the difficulties of managing legacy on-prem infrastructure, and the advantage of building a management layer on top of data center infrastructure to make it more programmable. We’ve done lots of shows recently about Kubernetes in the context of cloud computing. Today’s show outlines how modern on-prem infrastructure can be managed like a cloud. Full disclosure: HPE is a sponsor of Software Engineering Daily. The post On-Prem Cloud with Bob Fraser appeared first on Software Engineering Daily.
undefined
Nov 29, 2018 • 52min

Cloud Costs with Ran Rothschild

Cloud computing changed the economics of running a software company. Before the cloud, a software company had to purchase physical machines which often required thousands of dollars paid up front. The cloud allowed developers to deploy their applications for free, to operate a business for cheap, and to scale without hiring a dedicated team to manage the servers. Building in the cloud is cheap, but scaling in the cloud can get expensive. A growing company can often save money by changing which cloud instances and services they use. Reducing the number of server instances, changing the size of compute instances, and changing rules around auto scaling. By using monitoring, dashboards, and regular analysis of where money is spent, a business can find thousands of dollars of wasted spend per month. There are also broad strategic decisions around cost. One area to study is the use of “managed” services like Amazon DynamoDB, Google BigQuery, and Amazon Lambda. These services are proprietary, and can lead to lock-in. Sometimes they can be quite expensive. But they can save developers hours of time because they are easy to use, and provide high uptime guarantees. Ran Rothschild works at DoIT International, a company that helps businesses figure out how to save money on their cloud infrastructure. He joins the show to discuss the places where the most money is wasted and how startups can manage their infrastructure in a cost-effective manner. He also tells some stories about significant overspend. Full disclosure: DoIT International is a sponsor of Software Engineering Daily. The post Cloud Costs with Ran Rothschild appeared first on Software Engineering Daily.
undefined
Nov 19, 2018 • 55min

Schedulers with Adrian Cockcroft Holiday Repeat

Originally published on July 6, 2016. Scheduling is the method by which work is assigned to resources to complete that work. At the operating system level, this can mean scheduling of threads and processes. At the data center level, this can mean scheduling Hadoop jobs or other workflows that require the orchestration of a network of computers. Adrian Cockcroft worked on scheduling at Sun Microsystems, eBay, and Netflix. In each of these environments, the nature of what was being scheduled was different, but the goals of the scheduling algorithms were analogous–throughput, response time, and cache affinity are relevant in different ways at each layer of the stack. Adrian is well-known for helping bring Netflix onto Amazon Web Services, and I recommend watching the numerous YouTube videos of Adrian talking about that transformation. The post Schedulers with Adrian Cockcroft Holiday Repeat appeared first on Software Engineering Daily.
undefined
Nov 15, 2018 • 55min

Liquid Software with Baruch Sadogursky

The software release process is a barrier between written code and a live production environment that affects users. A software release process can involve a variety of different practices. Code might be tested for bugs using automation and manual testing. Static analysis tools can look at the code for potential memory leaks. A software release might go out to a small percentage of the total user base before it gets deployed to the entire audience. At some organizations, a software release can be slow and painful. The release might be bottlenecked by a manual approval step, which slows down developers from quickly deploying their own changes. If a consistent version history of software is not maintained, a release can be hard to roll back in the event of an error. In the case of a large, monolithic architecture, a release can be scary, because it can be hard to understand how the monolithic codebase functions. This set of challenges within the release process lowers the quality of software, and can make it frustrating to build software. The release process is just one area of software development that many organizations have a desire to smooth out. Over the past ten years, a set of technologies and philosophies have provided improvements to the software development process. DevOps, continuous delivery, microservices, cloud providers, and serverless tools all make it easier for a company to focus on its core competency and release software faster. Baruch Sadogursky is an author of Liquid Software, a book about continuous updates and DevOps. Liquid Software describes an idealized vision of what today’s architecture could aspire to. The focus of the book is continuous updates, which allow for rapidly improving, evolving software quality. Baruch joins the show to discuss how software has changed in the last twenty years, and how the future of software development could look. Full disclosure: Baruch works at JFrog, which is a sponsor of Software Engineering Daily. The post Liquid Software with Baruch Sadogursky appeared first on Software Engineering Daily.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app