What can we learn from the Crowdstrike IT outage?

by Black Hat Middle East and Africa
on
What can we learn from the Crowdstrike IT outage?

The Crowdstrike global IT outage caused widespread disruption, with critical industries put at risk as some of their services stalled. Now, we’re seeing large corporations (like Delta Airlines) bringing lawsuits against Crowdstrike, as industries work to overcome the ongoing impact of IT failures. 

We asked Yassir Abousselham (Founder and CEO at Silicon Valley Cyber) to share his perspective on the media response and the lessons we can learn from the Crowdstrike outage. 

Here’s what he told us. 

What was your impression of the response to the Crowdstrike outage – both the response we saw in the mainstream media, and the response you observed among the cybersecurity community?

“Both the cybersecurity community and the media reacted with shock at the impact that a single vendor can have across industries. This reaction is typical for large-scale events that serve as a sobering reminder of the fragility of our technology-dependent economy.” 

What can organisations (and particularly those in critical industries) do to increase their resilience against third party outages like this? 

“To improve resilience against similar outages, organisations, especially those in critical industries, should assess the potential impact of third-party software on their service availability and update their business continuity procedures accordingly. 

“Open source and commercial software with elevated system privileges, particularly those receiving updates directly from the vendor or developer, are prime targets for such assessments. 

“Organisations should adopt deployment strategies that allow them to catch issues before the update is promoted to production systems. Concurrently, other mechanisms should be considered to mitigate or transfer the risk of similar events. These mechanisms include:

  • Developing appropriate business continuity and incident response procedures for third-party-caused outages.
  • Implementing centralised asset and configuration management.
  • Testing system recovery scenarios.
  • Providing backup access to users through Virtual Desktop Infrastructure or Secure Browsers on personal devices.
  • Ensuring vendor agreements contain appropriate indemnifications for compromises and availability-impacting events.
  • Confirming cyber insurance coverage for third-party-caused outages.”

“OS vendors such as Microsoft also have the responsibility to ensure that third-party code with elevated system access (e.g., Ring Zero/Kernel/System/Root) is subjected to appropriate testing. While Crowdstrike certifies new sensor releases through Microsoft's Windows Hardware Quality Labs (WHQL) program, channel updates bypass these tests. 

“Considering the large number of vendors with kernel access, the more reliable approach would be for Operating System vendors, including Microsoft, to improve the resilience of their products by implementing further guardrails to prevent similar incidents from occurring.”

How concerned should we be about the possibility of an even more serious outage incident affecting critical industries? 

“Given the increasingly interconnected technology ecosystem, there is no guarantee that similar events will not occur in the future. 

“Specifically, most Endpoint Detection & Response (EDR) vendors, along with those in other product categories, have kernel access in Windows. A faulty update from any of these vendors could cause a similar incident.

“Our responsibility as technology customers is to learn from the CrowdStrike outage and deploy mitigations for similar scenarios. Additionally, we should demand better guardrails from OS vendors and transparency on testing, service availability, incident response, and product security from vendors with elevated system access and impact on critical services.”

Crowdstrike made it clear that the outage was 'not a security incident or cyber attack.' Do you think it's reasonable to define an outage like this as 'not a security incident'?

“By definition, security incidents or cyber attacks involve a threat actor and unauthorised access. As a security solutions vendor, it was important for Crowdstrike to provide accurate details about the incident to inform their customers' response. 

“Although availability is one of the pillars of the information security triad (Confidentiality, Integrity, and Availability), labelling this outage as a security incident would have triggered irrelevant incident response procedures, wasted valuable incident response resources, and eroded Crowdstrike’s brand image as a leading cybersecurity vendor.”

Finally, how do you think cybersecurity events like Black Hat MEA could facilitate the development of greater resilience among cybersecurity vendors and organisations? 

“Black Hat MEA plays a crucial role in facilitating the exchange of information on organisational resilience. As the largest gathering of cybersecurity professionals in the region, the event serves as a forum where organisations affected by outages can share lessons learned. 

Additionally, cybersecurity vendors have the opportunity to provide transparency regarding how they ensure their customers’ business continuity, alongside their threat mitigation capabilities. Information related to product security, testing, rollback practices, and incident handling procedures should be made available to support informed purchasing and risk management decisions.” 

Thanks to Yassir Abousselham at Silicon Valley Cyber. Join us in Riyadh at Black Hat MEA to discover the latest in cybersecurity research and development.

Share on

Join newsletter

Join the newsletter to receive the latest updates in your inbox.


Follow us


Topics

Sign up for more like this.

Join the newsletter to receive the latest updates in your inbox.

Related articles