The InfoQ Podcast

InfoQ
undefined
Feb 16, 2017 • 39min

Jonas Bonér on the Actor Model, Akka, Reactive Programming, Microservices and Distributed Systems

Jonas Bonér, CTO of LightBend and creator Akka, discusses using Akka when developing distributed systems. He talks about the Actor Model, and how every Microservice needs to be viewed as a system to be successful. Why listen to this podcast: - Akka is JVM-based framework design for developing distributed systems leveraging the Actor Model - an approach for writing concurrent systems that treat actors as universal primitives and the most successful model with abstraction has been streaming - Circuit breakers in Akka are a backup and retry policy; they protect you by capturing failure data and allow you to roll back - Every Microservice needs to be viewed as a system, it needs to have multiple parts that run on different machines in order to function and be fully resilient - thus is a Microsystem - Two different trends have emerged when it comes to hardware and environments: one is the trend toward Multi-core, the is a movement toward virtualized environments and the cloud - Saga pattern of managing long running transactions in a distributed system fits very well with messaging style architectures Notes and links can be found on: http://bit.ly/2kwB2eB Akka The Actor Model When Akka and the Actor Model is the perfect choice Circuit breakers patterns in distributed systems Two trends toward Multi-core Reactive Manifesto Event Driven vs. Message Driven Reactive Programming and Streams Microliths to Microsystems What do you have to get right before you start trying to deploy a distributed systems? Working with ML / AI at Lightbend to understand tracing through distributed system Saga Pattern More on this: Quick scan our curated show notes on InfoQ http://bit.ly/2kwB2eB You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
undefined
Jan 27, 2017 • 40min

Peter Bourgon on Gossip, Paxos, Microservices in Go, and CRDTs at SoundCloud

Peter Bourgon discusses his work at Weaveworks, discovering and imlemeting CRDTs for time-stamped events at Soundcloud, Microservices in Go with Go Kit and the state of package management in Go. Why listen to this podcast: - We’ve hit the limits of Moore’s law so when we want to scale we have to think about how we do communication across unreliable links between unreliable machines. - In an AP algorithm like Gossip you still make forward progress in case of a failure. In Paxos you stop and return failures. - CRDTs give us a rigours set of rules to accommodate failures for maps, sets etc. in communication that result in am eventually consistent system. - Go is optimised to readers/maintainers vs. making the programmers’ life easier. Go is closer to C than Java in that it allows you to layout memory very precisely, allowing you to, for example, optimise cache lines in your CPU. - Bourgon started a project called Go Kit, which is designed for building microservices in Go. It takes inspiration from Tiwtter’s Scala-based Finagle which solved a lot of Micoservice concerns. - Go has a number of community-maintained package managers but no good solution; work in ongoing to try and resolve this. Notes and links can be found on: http://bit.ly/2kaHC9k Work at Weaveworks Gossip vs. Paxos CRDTs at SoundCloud Go Go in large teams Go and Java package management Microservices in Go with Go Kit Logging and tracing in a distributed environment More on this: Quick scan our curated show notes on InfoQ. http://bit.ly/2kaHC9k You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
undefined
Jan 6, 2017 • 29min

Neha Batra - Pivotal Labs Pair Programming

In this week’s podcast Wes Reisz talks to Neha Batra, a software engineer at Pivotal Labs. Neha spoke about pair programming in her recent QCon San Francisco 2016 presentation, and has taken time to discuss techniques to get started with the practice as well as tips for implementing it on your team. Neha also touches on vulnerability based trust and how it can help effectively build a trusting team environment. Why listen to this podcast: - If you successfully start with pair programming, other tenants of XP are pulled along with you - Ways to get creative with remote pairing to make it work - The daily retro - Overcoming hesitance with managers when trying to implement pair programming full time - Vulnerability based trust building Notes and links can be found on: http://bit.ly/2i2a0sJ How has Pair Programming Evolved Over the Years? 6m:17s - A lot of the fundamentals are the same, but with XP we take it to the extreme to be able to do it eight hours a day. 6m:24s - To pair for eight hours a day we adapt to a lot of other process to create a simpler way of working, giving us an easier level to default to. 6m:44s - We use phrases in the team to make sure we agree on a test, that there are no false positives, when to refactor etc. This helps us avoid accruing code debt since we don’t do code reviews. More on this: Quick scan our curated show notes on InfoQ. http://bit.ly/2i2a0sJ You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
undefined
Dec 30, 2016 • 33min

Oliver Gould About Architecting to Avoid and Recover from Failure

In this week’s podcast, Robert Blumen talks to Oliver Gould at QCon San Francsico 2016. Oliver is the CTO of Buoyant where he leads open source development efforts. Prior to Buoyant he was a Staff Infrastructure Engineer at Twitter where he was technical lead on Observability, Traffic, Configuration and Co-ordination teams. Why listen to this podcast: - Stratification allows applications to own their logic while libraries take care of the different mechanisms, such as service discovery and load balancing - Cascading failures can’t be tested or protected against, so having a fast time to recovery is important - Having developers own their services with on-call mechanisms improves the reliability of the service; it’s best to optimise automatic restarts so problems can be addressed during normal working hours - Post mortem analysis of failures are important to improve run books or checklists and to share learning between teams - Incremental roll out of features with feature flags or weighted routing provides agility while testing with production load, which highlights issues that aren’t seen during limited developer testing Notes and links can be found on: http://bit.ly/2ivoz9w 4m:05s - Each domain has different failure and operating modes, and the layered approach to resiliency means that the layer handles this automatically 4m:30s - Large systems may fail in unexpected ways 4m:35s - Twitter originally had the “Fail Whale” but this has been phased out as the system has become more stable 4m:50s - As Twitter grew, it needed to move quicker, with more engineers and less whale time 5m:10s - Automation and social tools were needed to improve the situation More on this - Quick scan our curated show notes on InfoQ: http://bit.ly/2ivoz9w You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
undefined
Dec 23, 2016 • 25min

Chris Richardson on Domain-Driven Microservices Design

In this week’s podcast, Thomas Betts talks with Chris Richardson, a developer, architect, Java Champion and author of POJOs in Action. Before his workshop on Microservices w/ Spring Boot and Docker at QCon San Francisco 2016, Richardson took time to discuss his ideas on how to use DDD and CQRS concepts as a guide for implementing a robust microservices architecture. Why listen to this podcast: - "Microservice architecture" is a better term than "microservices". The latter suggests that a single microservice is somehow interesting - The concepts discussed in Domain-Driven Design are an excellent guide for how to implement a microservices architecture - Bounded Contexts correspond well to individual microservices - Event sourcing and CQRS provide patterns for how to implement loosely coupled services - When converting a monolith to microservices, avoid a big bang rewrite, in favor of an iterative approach Notes and links can be found on: http://bit.ly/2hZ8TM1 11m:51s - Microservices must be loosely coupled, usually creating a model with one database per service. 12m:45s - There is a business requirement to maintain data consistency across services, and using an event driven architecture is a good way to achieve that. 13m:38s - Event sourcing is specific technique for persisting domain objects as a series of events. 14m:11s - Just as transactions don’t like to be split across microservices, queries cannot simply join across multiple data sources. CQRS provides a solution that accommodates querying via microservices and materialized views. More on this: Quick scan our curated show notes on InfoQ. http://bit.ly/2hZ8TM1 You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
undefined
Dec 16, 2016 • 36min

Keith Adams on the Architecture of Slack, using MySql, Edge Caching, & the backend Messaging Server

In this week’s podcast, QCon chair Wesley Reisz talks to Keith Adams, chief architect at Slack. Prior he was an engineer at Facebook where he worked on the search type live backend, and is well-known for the HipHop VM [hhvm.com]. Adams presented How Slack Works at QCon SanFrancisco 2016. Why listen to this podcast: - Group messaging succeeds when it feels like a place for members to gather, rather than just a tool - Having opt-in group membership scales better than having to define a group on the fly, like a mailing list instead of individually adding people to a mail - Choosing availability over consistency is sometimes the right choice for particular use cases - Consistency can be recovered after the fact with custom conflict resolution tools - Latency is important and can be solved by having proxies or edge applications closer to the user Notes and links can be found on: http://bit.ly/keith-adams 3m:30s Voice and video interactions are impacted by latency; the same is true of messaging clients 4m:00s The user interface can provide indications of presence, through avatars indicating availability and typing indicators 4m:15s Latency is important; sometimes the difference is between 100ms and 200ms so the message channel monitors ping timeout between server and client 4m:40s 99th percentile is less than 100ms ping time 5m:15s If the 99th percentile is more than 100ms then it may be server based, such as needing to tune the Java GC 5m:25s Network conditions of the mobile clients are highly variable  6m:20s Mobile clients can suffer intermittent connectivity More on this: Quick scan our curated show notes on InfoQ. http://bit.ly/keith-adams You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
undefined
Dec 9, 2016 • 25min

Haley Tucker on Responding to Failures in Playback Features at Netflix

In this week’s podcast, Thomas Betts talks with Haley Tucker, a Senior Software Engineer on the Playback Features team at Netflix. While at QCon San Francisco 2016, Tucker told some production war stories about trying to deliver content to 65 million members. Why listen to this podcast: - Distributed systems fail regularly, often due to unexpected reasons - Data canaries can identify invalid metadata before it can enter and corrupt the production environment - ChAP, the Chaos Automation Platform, can test failure conditions alongside the success conditions - Fallbacks are an important component of system stability, but the fallbacks must be fast and light to not cause secondary failures - Distributed systems are fundamentally social systems, and require a blameless culture to be successful Notes and links can be found on: http://bit.ly/2hqzQ6K 2m:05s - The Video Metadata Service aggregates several sources into a consistent API consumed by other Netflix services. 2m:43s - Several checks and validations were in place within the video metadata service, but it is impossible to predict every way consumers will be using the data. 3m:30s - The access pattern used by the playback service was different than that used in the checks, and led to unexpected results in production. 3m:58s - Now, the services consuming the data are also responsible for testing and verifying the data before it rolls out to production. The Video Metadata Service can orchestrate the testing process. More on this: Quick scan our curated show notes on InfoQ. http://bit.ly/2hqzQ6K You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
undefined
Dec 2, 2016 • 29min

Kolton Andrus on Lessons Learnt From Failure Testing at Amazon and Netflix and New Venture Gremlin

In this week's podcast, QCon chair Wesley Reisz talks to Kolton Andrus. Andrus is the founder of Gremlin Inc. He was a Chaos Engineer at Netflix, focused on the resilience of the Edge services. He designed and built FIT: Netflix’s failure injection service. Prior, he improved the performance and reliability of the Amazon Retail website. Why listen to this podcast: - Gremlin, Kolton Andrus' new start-up, is focused on providing failure testing as a service. Version 1, currently in closed beta, is focused on infrastructure failures. - Lineage-driven Fault Injection (LDFI) allowed Netflix to dramatically reduce the number of tests they needed to run in order to explore a problem space. - You generally want to run failure tests in production, but you can't start there. Start in developemnt and build up. - Having failure testing at an application level, as Netflix does, so you can have request level fault injection for a specific user or a specific device. - Being able to trace infrastructure with something like Dapper or Zipkin offers tremendous value. At Netflix, the failure injection system is integrated into the tracing system, which meant that when they caused a failure they could see all the points in the system that it touched. More on this: Quick scan our curated show notes on InfoQ. http://bit.ly/2fT9YiM You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
undefined
Nov 18, 2016 • 26min

Preslav Le on How Dropbox Moved off AWS and What They Have Been Able to Do Since

As InfoQ previously reported in March 2016, Dropbox announced that they had migrated away from Amazon Web Services (AWS). In this week's podcast Robert Bluman talks to Preslav Le. Preslav has been a software engineer at Dropbox for the past three years, contributing to various aspects of Dropbox’s infrastructure including traffic, performance and storage. He was part of the core oncall and storage oncall rotations, dealing with high emergency real world issues, from bad code pushes to complete datacenter outages. Why listen to this podcast: - Dropbox migrated away from Amazon S3 to their own data centres to allow them to optimise for their specific use case. - They are experimenting with Shingled Magnetic Recording (SMR) drives for primary storage to increase storage density. All writes go to an SSD cache and then get pushed asynchronously to the SMR disk. - Their average block size is 1.6MB with a maximum block size of 4MB. Knowing this allows the team to tune their storage system. - Three languages are used for the backend infrastructure. Python is used mainly for business logic, Go is the primary language used for heavy infrastructure services, and in some cases, for example where more direct control over memory is needed, Rust is also used. - Dropbox invest very heavily in verification and automation. A verifier scans every byte on disk and checks that it matches the checksum in the index. - Verification is also used to check that each box has the block keys it should have. Notes and links can be found on http://bit.ly/preslav-le Dropbox’s motivation for moving off the cloud 2:40 - Dropbox used Amazon S3 and other services where it made sense, but they stored all the metadata in their own data centres. 3:30 - Initially this was done because Amazon had poor support for persistent storage at the time. This has since improved but it didn’t make sense for dropbox to move the metadata back. 4:01 - By that time the dropbox team was ready to tackle the storage problem and built their own in-house replacement for S3, called Magic Pocket. Magic Pocket allowed Dropbox to move away from Amazon altogether. 4:30 - The move saved money, but also allowed DropBox to optimise for their specific use case and be faster. More on this: Quick scan our curated show notes on InfoQ. http://bit.ly/preslav-le You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq
undefined
Nov 11, 2016 • 26min

Randy Shoup on Stitch Fix's Technology Stack, Data Science and Microservices

In this week's podcast QCon chair Wesley Reisz talks to Randy Shoup. Shoup is the vice president of engineering at Stitch Fix. Prior to Stitch Fix, he worked for Google as the director of engineering and cloud computing, CTO and co-founder of Shopilly, and chief engineer at Ebay. Why listen to this podcast: - Stitch Fix's business is a combination of art and science. Humans are much better with the machines, and the machines are much better with the humans. - Stitch Fix has 60 engineers, with 80 data scientists and algorithm developers. This ratio of data science to engineering is unique. - With Ruby-on-Rails on top of Postgres, the company maintains about 30 different applications on the same stack. - The practice of Test Driven Development makes Continuous Delivery work, and the practice of having the same people build the code as those who operate the code makes both of these things much more powerful. - Microservices gives feature velocity, the ability for individual teams to move quickly and independently of each other, and independent deployments. - Microservices solve a scaling problem. They solve an organisational scaling problem, and a technological scaling problem. These are not the problems that you have early on in the startup. - In the monolithic world, if you're not able to continue to vertically scale the application or the database or whatever your monolith is. And so for scaling reasons alone you might consider breaking it up into what we call microservices. Notes and links can be found on http://bit.ly/randy-shoup-podcast Data Science and Stitch Fix 1m:57s - Stitch Fix re-imagines retail, particularly for clothing. When you sign up, you fill out survey of the kinds of things that you like and you don't like, and we choose what we think you're going to enjoy based on the millions of customers that we have already. And we use a ton of data science in that process. 3m:00s - That goes into our algorithms and then our algorithms make personalised recommendations based on all the things we know about our other customers... there's a human element as well: we have 3,200 human stylists that are all around the United States and they choose the five items that go into the box [of clothing]. 3m:29s - What we like is that this is a combination of art and science. Modern companies combine what machines are really good at, such as chugging through the 60 to 70 questions times the millions of customers, and combining that with the human element of the stylists, figuring out what things go together, what things are trending, what things are appropriate... Humans are much better with the machines, and the machines are much better with the humans. [...] More on this: Quick scan our curated show notes on InfoQ. http://bit.ly/randy-shoup-podcast You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app