Solving Puzzles in Production with Liora Friedberg
Oct 7, 2024
auto_awesome
Liora Friedberg, a Production Engineer at Jane Street with expertise in economics and computer science, discusses the unique blend of puzzle-solving and software engineering in high-stakes environments. She shares insights on training methods, including tabletop simulations and hands-on exercises, to prepare engineers for the complexities of live system support. Liora also highlights the importance of fostering a blame-free culture, effective monitoring systems, and the art of postmortems to encourage learning and improve operational support.
Production engineering at Jane Street blends real-time problem-solving with long-term strategic improvements within critical software systems for trading.
Training for Production Engineers includes tabletop simulations and hands-on exercises, emphasizing both technical knowledge and collaborative skills essential for effective issue resolution.
A culture of constructive feedback, exemplified by blame-free postmortems, fosters learning from mistakes, enhancing the overall efficiency of production engineering processes.
Deep dives
Understanding Production Engineering
Production engineering at Jane Street focuses on the production layer of software systems critical for trading billions of dollars daily. The role encompasses live system support, making production engineers the first line of defense during trading hours when issues arise. Unlike being on-call during off-hours, the role involves real-time problem-solving during live operations, emphasizing immediate response to alerts or observations from users. Additionally, production engineers engage in long-term improvements aimed at reducing the frequency and severity of issues, thus blending short-term support with strategic project work.
Role Dynamics and Team Structures
Production engineering interacts closely with software engineering, often blurring the lines between the two roles. While software engineers might specialize in a few systems, production engineers maintain a comprehensive understanding of broader system interactions. This variety allows teams to effectively collaborate, as they bring diverse perspectives to problem-solving. The specific distribution of time spent on support versus project work varies by team, usually requiring a balance of immediate responses and ongoing developmental efforts.
Real-World Problem Solving
Production engineers often face unique challenges, such as new types of trading activities that the system has not previously encountered. These scenarios require collaboration with various stakeholders, translating business requirements into technical solutions. Effective issue resolution necessitates understanding both the technical aspects of the system and the organizational context in which it operates. As a result, production engineers develop a rich understanding of the financial systems and how different components interact, enhancing the overall efficiency of the trading process.
The Culture of Support and Learning
A crucial aspect of production engineering is fostering a culture where mistakes are acknowledged and constructive feedback is prioritized. Postmortems are utilized after incidents to analyze failures without assigning blame, focusing instead on learning and improving processes. This culture encourages individuals to be open about errors and promotes a collaborative environment for finding solutions. It ultimately leads to actionable insights that drive technological and procedural improvements across the team.
Qualities of Successful Production Engineers
Effective production engineers possess strong communication skills, enabling them to navigate different perspectives and collaborate across teams. They excel in debugging and remain calm under pressure, focusing on careful analysis rather than rushing into solutions. The ability to enjoy solving complex problems and addressing high-stakes situations is also a defining trait. These qualities combined make production engineers uniquely poised to manage and improve critical systems in a fast-paced trading environment.
Liora Friedberg is a Production Engineer at Jane Street with a background in economics and computer science. In this episode, Liora and Ron discuss how production engineering blends high-stakes puzzle solving with thoughtful software engineering, as the people doing support build tools to make that support less necessary. They also discuss how Jane Street uses both tabletop simulation and hands-on exercises to train Production Engineers; what skills effective Production Engineers have in common; and how to create a culture where people aren’t blamed for making costly mistakes.
You can find the transcript for this episode on our website.
Some links to topics that came up in the discussion: