Ep6: After CrowdStrike chaos, should Microsoft kick EDR agents out of Windows kernel?
Jul 26, 2024
auto_awesome
The podcast dives into the chaos caused by a CrowdStrike update that blue-screened millions of Windows systems, spotlighting the urgent need for better testing. It questions Microsoft's handling of EDR agents and the responsibilities tied to kernel access. A discussion on Mandiant's report reveals insights into North Korean cyber threat tactics. The hosts critique cybersecurity reporting and explore the implications of the NSO Group lawsuits on tech giants. Overall, it's a gripping look at the intersection of cybersecurity failures and corporate accountability.
The CrowdStrike incident underscored the dire consequences of updates lacking proper testing and validation, causing widespread system failures across millions of computers.
The podcast highlighted the critical balance between security measures and system stability, suggesting that effective detection must not compromise operational uptime.
There was a strong emphasis on the shared responsibility between security vendors and customers in managing cybersecurity solutions, advocating for greater operational transparency and thorough vetting of software.
Deep dives
CrowdStrike Incident Overview
A significant cybersecurity incident involving CrowdStrike affected approximately 8.5 million Windows computers globally, resulting in widespread system failures. This incident stemmed from a problematic update that passed validation checks, despite causing severe blue screen errors on systems that installed it. The failure led to critical infrastructure, airlines, and large corporations facing disruptions for a short window of time, demonstrating how a single erroneous update can have extensive repercussions. This situation highlights the need for more robust testing and validation processes before deploying critical updates to such a vast customer base.
Update Rollout and System Impact
The deployment of the flawed update occurred during off-peak hours, potentially mitigating the situation from being far worse if it had been released during high-usage times. However, the speed at which the update spread raised concerns about the reliability and monitoring of the update infrastructure. The incident raised questions about how safeguards could fail to detect such a significant problem prior to rollout. The discussion suggested that the update's effects could have spiraled out of control, emphasizing the importance of having effective limits in place to prevent widespread issues.
Kernel Mode and Software Stability
The ability for security software to operate within kernel mode presents critical challenges regarding system stability and how updates are performed. The conversation highlighted a paradox where security measures must balance effective detection of threats with maintaining system uptime and stability, suggesting that a shift might be necessary. Greater scrutiny is needed in how security vendors ensure that their updates do not destabilize operating systems, especially in light of incidents that disrupt numerous machines. Ensuring that such measures do not compromise core system functions remains a persistent challenge in cybersecurity development.
Responsibility in Cybersecurity
The podcast emphasized the shared responsibility between customers and vendors in deploying and managing cybersecurity solutions. Businesses often install security products on critical systems without fully understanding the potential risks and implications of updates. The need for organizations to take ownership of the security measures they apply to their infrastructure was a focal point of the discussion, especially the importance of thorough vetting of any introduced software. Demand for better operational transparency from security vendors was underscored, as companies should be encouraged to scrutinize the software included in their environments.
Long-term Implications for Cybersecurity Practices
Recent incidents, like the CrowdStrike update failure, could serve as a catalyst for the industry to reevaluate and enhance current cybersecurity practices. The discussion pointed out that necessitated changes could lead to improved operational procedures and more rigorous standards for software updates and testing in the future. The notion that incidents like these prompt discussions not only on immediate response but also on industry-wide reforms was prevalent. As history shows, significant failures force companies to adapt their practices, leading to safer operational environments in the long run.
Three Buddy Problem - Episode 6: As the dust settles on the CrowdStrike incident that blue-screened 8.5 million Windows computers worldwide, we dig into CrowdStrike’s preliminary incident report, the lack of transparency in the update process and the need for more robust testing and validation. We also discuss Microsoft's responsibility to avoid infinite BSOD loops, risks of deploying EDR agents on critical systems, and how an EU settlement is being blamed for EDR vendors having access to the Windows kernel.
Other topics on the show include Mandiant's attribution capabilities, North Korea’s gov-backed hacking teams launching ransomware on hospitals, KnowBe4 hiring a fake North Korean IT worker, and new developments in the NSO Group surveillance-ware lawsuit.
Hosts: Costin Raiu (Art of Noh), Juan Andres Guerrero-Saade (SentinelLabs), Ryan Naraine (SecurityWeek)