Faulty code in security vendor CrowdStrike Rapid Response content update combined with corruption of the content validator used to guarantee the update is safe for production distribution knocked 8.5 million Windows systems offline in July 2024.

Operations across all industries were disrupted. The fix was manual and slow. As a result, thousands of flights were cancelled, medical procedures were delayed, and broadcast programs were interrupted. It took days for operations to complete resume normally. The incident is projected to cost enterprises billions of dollars.

CrowdStrike disclosure promise

The event raised serious questions both about vendor quality control and their customers’ over reliance on automation with respect to IT updates. With respect to the former, CrowdStrike published an initial incident report, identifying the pair of issues that drove the proverbial IT train right off the tracks with the mass system shutdowns across the globe. Along with profuse apologies from CrowdStrike’s CEO, the company promised a full post-breach disclosure once it completes its investigation.

Microsoft offered up hundreds of engineers to support customer system restoration efforts. The company said it is collaborating with other cloud providers including Amazon Web Services and Google Cloud Platform to understand the full effect of the incident with the expectation being a way to gain a thorough understanding of what happened during this event will help everyone better prepare for a future issue.

A lot to learn

In a blog post, John Cable, vice president of program management for Windows servicing and delivery, wrote that the company needs to make development changes to support greater systems resilience. Cable said the company is looking to reduce kernel-level access for software applications to better protect Windows operating systems against malicious code and corrupted software.

Enterprises that were impacted need to revisit their business continuity plans. Everyone involved, from the vendors and service providers to the end customers, has a lot to learn. There is an open dialogue now that hopefully will lead to better organisational resilience in the future.