The CrowdStrike Microsoft story is all over the news, so I won’t waste time with a restatement of facts.
People all over the world were at best inconvenienced, and at worst suffered true hardships as computer systems were knocked out all over the world, halting entire processes. The very security we look to for protection, ironically, was the source of this failure (by accident of course), exposing a truly stunning level of concentration in our system dependence.
It’s hard to look at events like this for positives, and yet as leaders we must see this as the wakeup call it is: presenting the need for far greater digital resilience.
No single point of failure should be able to paralyze nearly 10 million systems globally—to this end we must have greater diversity in the systems at use, more redundancies and fallback capacities, and better crisis preparedness.
Security, to be fair, is a balancing act in most every way, weighing constant threats and an urgent need for speed against the time required for proper testing, diligence, and documentation.
To my mind, the flaw most exposed by the CrowdStrike incident is how such an error could slip through all checks and propagate throughout the word’s systems unseen, until it was too late.
Today I want to consider automation in software testing in this light: focusing on its capacity to improve our resilience. Current advances make this not only possible, I believe, but essential.
Automation Innovations in Software Testing
Testing automation has long been a fundamental part of software development.
So it should be no surprise that AI and ML adoption has also been swift by most measures, with some 3/4 of surveyed companies at least experimenting with AI automation in testing, according to numerous sources. (Examples include: LambdaTest reporting 78%, Test Guild 76%.)
Given the natural fit, this number is certain to rise.
But less at issue is how many are using AI in some capacity, and more how they are using it. And if it is helping to increase coverage and efficiency.
Generative AI solutions are not yet reliable enough to replace humans as gatekeepers for quality, and must be implemented wisely, but they can already do numerous things which can help to this end, including taking on testing-adjacent tasks like:
- Automatic test case generation
- Smarter test data that’s more realistic (a classic bottleneck)
- Failure prediction, using historical models
- Prioritizing testing areas by risk, history, and activity
- Optimizing the testing workflow
- Analyzing root causes and patterns
This kind of AI-powered QA testing assistance is possible right now.
Gartner predicts that by 2025, AI innovations in testing will reduce the time required by as much as 70%, and this must mean both increased coverage in what gets tested (with more continuous testing, and earlier in-cycle), as well as a shift in the nature of the QA role overall.
As AI testing continues to improve, it will move increasingly earlier in the process, with a capacity to run tirelessly, around the clock, and in parallel, bolstering the DevOps goal of true CI/CD.
Looking into the future of QA testing with AI, it’s easy to see today’s testers able to focus more on strategy, becoming more involved throughout development, and as a result improving the ultimate testing quality as well as efficiency.
Automated solutions should be addressing bottlenecks in human testing, and providing additional levels of verification, but they cannot replace human testers. In fact, human oversight in this area will persist for some time, as the stakes, evidenced in the CrowdStrike event, are simply too high.
I’ve written about AIOps, which can translate your operations data into actionable insights with improved detection and response, and these advances can also provide greater AI-driven resilience in your testing, with more clarity on your entire workflow.
What Happened to Our Digital Resilience?
The ability to stand up to, and bounce back from, adverse events, be it fluctuations in traffic, cyber attacks, or hardware and software failures, is absolutely critical in 2024.
We only continue to see increases in cybercrime alongside a growing dependence on our systems, making failures ever more dangerous.
With airports, banks, hospitals, and more all paralyzed by a single flawed update, with many also unable to recover quickly, we see what digital resilience is not: tightly coupled and utterly dependent on a single software (and/or hardware) solution.
So how did we get to this place?
AI is tremendous tool which I believe will transform work as we know it, but it is also still costly and resource intensive, and with a handful of companies working as the prime arbiters of AI quality, we see a potentially dangerous concentration which can increase dependence and decrease our resilience.
In addition to diversifying our solutions where possible to provide redundancies, we should also be testing our emergency responses to these events, before they happen.
AI is used in cybersecurity testing in this way, for example, but these techniques can be employed more broadly, to help organizations prepare for randomized outages of all scales, so they are not left scrambling when such a failure occurs.
[Check out our PTP Report on chaos engineering for an example of how Netflix is using AI in security testing to improve resilience.]
Conclusion
Automation has long been a part of testing, and the QA teams at many organizations are already experimenting with AI-powered testing innovations.
But if there is one thing to take away from the CrowdStrike Windows calamity, it’s that increased automation must be part of our renewed emphasis on digital resilience, especially in software development.
Mistakes happen, and that will not end with AI (in fact, AI coding, which is faster than ever, generates ever more code that must still be checked as carefully as ever).
What this technology already provides, through AI-enhanced QA practices, is time savings, and these must increasingly be applied for greater coverage, improved sandboxing and testing conditions, and testing earlier in workflows.
To this end, I recommend organizations:
- Invest in automated testing now, employing tools to improve testing, thereby enhancing product quality overall, with AI
- Use automation to make your testing ever more continuous, earlier in your DevOps pipeline
- Let AI help you improve, by showing bottlenecks, risks, and repeated problems
- Make your security and disaster preparedness more proactive by attacking your own vulnerabilities first
These are not distant dreams, but attainable objectives, and they increase our digital resilience by giving a stronger foundation, to help businesses remain standing even in an ever-shifting landscape.
References
CrowdStrike update that caused global outage likely skipped checks, experts say, Reuters
CrowdStrike—How Microsoft Will Protect 8.5 Million Windows Machines, Forbes
12 Data and Analytics Trends to Keep on Your Radar, Gartner