On this edition of The Guardian Podcast with Ryn Melberg, David Hussman joins Ryn to discuss all things chaos, including “Chaos Engineering” and “Chaos Day”. David is well known for his web site and location in Minneapolis both called DevJam. David is a busy man with an extremely active mind and Ryn discusses some of his most recent breakthroughs on the show.
According to David’s web site, Chaos Engineering is “the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.” Chaos Engineering is squarely focused on helping teams design, build and run systems that are highly resilient. Despite their extraordinary uptime, services from AWS and Azure can—and do—fail. Data centers lose power. The goal of Chaos Engineering is to ensure you can better understand how to design and build apps and services that can withstand havoc while delivering the optimal customer experience.
David describes a man who would go into server rooms to just start unplugging things to see what would happen and how the people who worked there would respond. “The fact that people were worried about what he was doing was a clue that they were not ready for the eventualities that followed,” David told Ryn. Chaos Engineering is not just another type of testing, though many are tempted to describe it that way. But practitioners argue that while testing is binary and simply asserts (or disproves) what we think we already know, Chaos is about creating experiments that help us learn more about a system. It helps us re-calibrate our mental model of how things really work and—by closing the feedback loop—helps us design and build more resilient systems. To learn more, listen to the entire podcast posted here.