24 theses on cybersecurity and AI

Getting to ninety-five isn’t so simple

AI hasn’t meaningfully changed anything in cybersecurity so far. Deep fake phishing is still rare, LLM hackers don’t work yet. The impact so far has likely been a mild acceleration, similar to the field of software engineering. LLMs are also useful to help non-native speakers write better phishing emails. The OpenAI/Microsoft disruption of state-affiliated actors in February 2024 gives good examples, such as: “Charcoal Typhoon used our services to research various companies and cybersecurity tools, debug code and generate scripts, and create content likely for use in phishing campaigns”.
In general, a lot happens in state agencies and won’t be known to the public for a long time. One of the most striking parts of the Stuxnet story is that 15 years later, we could have no clue that this ever occurred, if it wasn’t for a few mistakes from the creators. Electromagnetic radiations as a side channel had been used by governments since World War II, but the first unclassified research on the topic was published in 1985. There are many such stories. However, I think that the gap between classified and public information is probably a lot smaller for AI. This is because AI progress has been very fast, and intelligence agencies have probably started to understand its strategic importance in the past 2 years at most. So I don’t think they are much ahead of public progress in AI hacking capabilities. However, there’s probably a lot going on in stealing algorithmic secrets from AI companies, spying on other governments’ AI strategies (including assessing how thoroughly they’ve hacked the AI companies), and perhaps preparation for future sabotaging.
Everyone is asking how AI will affect the offense / defense balance in cybersecurity, but no one has the answer. The system is too complex to reason about, not to mention that exactly how AI progress goes will matter a lot. However, it’s correct to identify cybersecurity as a tricky and high-stakes domain as we get to human-level AI and beyond.
AI capabilities will introduce new attack vectors. We can imagine quick software update front-running, more adaptive malware, at-scale spear phishing, and probably others that no one is anticipating yet.
Due to bad adversarial robustness in deep learning, AI deployment is also introducing new attack vectors against the models themselves, such as prompt injection in LLMs (which I don’t think anyone would have imagined as a possible attack in, say, 2018 or even 2020). Prompt injection is often demonstrated today in email settings (someone could make your email agent send an email on your behalf!), but if adversarial robustness isn’t solved, opening a Github issue that jailbreaks the SWE agents could in the future be enough to compromise a repository! (Call that SWE agent hijacking).
That being said, a typical employee is also quite adversarially weak, though in a different way. Basic, poorly written phishing emails still work.
On the same note, installing security updates in humans is very slow and costly. Since humans don’t get upgraded, their devices will have to be: AI agents running in the background, making sure you’re not falling for hacking and scams, will be increasingly necessary. However, I think this will come too late. Vulnerable people are getting more vulnerable, and will be for some time.
Cybersecurity is heavily bottlenecked on labor on both the offensive and defensive side. In other words, there are vulnerabilities everywhere, and very few people looking. This means that you don’t need superintelligence to have a massive impact on cybersecurity: human-level AI will already be a big deal. This also means that while new attack vectors are interesting and worth thinking about, automation of known things will have a massive impact on its own.
It is currently very easy for a state actor to get the secrets of any AI company. We are far from where we should be there (more: the RAND report on securing model weights). This makes hardware export controls look even more valuable as a policy instrument: if your adversary (China) can’t train a frontier model even with your algorithmic secrets, you’re in a better position.
In the current era of compute scaling, frontier open-weight models might disappear in the near future (for national security or economic reasons). This makes it worth distinguishing between state-affiliated actors, and other actors. The former will have access to hacked model weights, while the latter will have to use APIs and struggle to avoid detection. This optimistically means that if monitoring got good enough, the non-state actors (currently responsible for a lot of damage) could get left behind. However, current incentives might not be enough to get companies to invest in the necessary level of monitoring (in fact, shutting down API access to bad actors is contrary to their first-order incentives).
We will enter a phase where AI companies will agree to, or be required to, run their frontier models on useful external tasks before release (currently, only evaluations are done). The first applications will likely be in cybersecurity, due to the adversarial nature of the field. For instance, running models on finding vulnerabilities in critical systems and codebases.
Serving different models to different actors will be increasingly important. It would be nice to have good methods for releasing a regular model, and the same model which is good in cyber-offensive tasks. We want the automated pentesters to exist, we just don’t want their latest version in everybody’s hands.
AI deployment will need to follow principles of least privilege, and perhaps also least capability (don’t use the model which is very good at hacking unless you need to do hacking in this application, etc). Just like in regular cybersecurity, this will come at a speed premium, and insecurity will therefore persist through the security / speed tradeoff (until maybe AI deployment is itself done by AIs, which could bring the cost of security low enough).
“Click and type” agents (that interact with a computer roughly like humans do) are coming soon (because the economic incentives for them are massive, and they don’t seem to require major breakthroughs), and will unlock new cybersecurity capabilities compared to previous scaffolding. Text-only is actually quite limiting for hacking agents (partly because hacking requires low-level I/O control, and partly because some major hacking tools are GUIs − more).
When we get agents that work, we won’t be far from hacking agents that work.
“AI finding new zero-days” isn’t that scary: AI being very good at vulnerability detection in source code would be good overall, as long as defenders are able to spend the most compute on that task, and before adversaries. This is discussed in more detail in section 6 of the eyeballvul preprint. Legacy systems will be an issue, though. And while we can get a good outcome, it won’t happen on its own.
“AI being a skilled hacker” is more scary: it’s easier to fix source code at scale than to fix the cybersecurity posture of organizations at scale.
In the current era of compute scaling, we can expect AI deployment to keep being quite centralized. When the economy completely runs on AI, this will introduce new single points of failure. Part 2 of What Failure Looks Like tells the story of a cascading series of AI systems getting out of distribution. When all the AI in the world is run by a handful of providers, hacking them could have immediate catastrophic consequences (this is different from What Failure Looks Like, where the AI systems being run are still the same, but observe a state of the world that puts them out of distribution).
Thinking that open-weight AI is less secure than APIs does not oppose everything we’ve learned in cybersecurity over the past decades. We have indeed empirically learned, over and over, that if obscurity is your only layer of security, you can be confident you’re doing something wrong, and that openness is generally a very good policy (especially in the context of cryptography). However, the disanalogies between open-weight models and open-source software and cryptography are sufficiently strong, such that reasoning by analogy won’t get you to a correct conclusion on its own.
Fuzzing is only partial automation, so in practice humans are still the ones finding vulnerabilities (if only by writing fuzzing harnesses and investigating the crashes). AI can be the full automation. We will get to a regime where humans are no longer the ones finding vulnerabilities, and that could be in just a few years. (I suspect that not everyone in cybersecurity is convinced of this).
When we get to digital minds, or long-running AI agents that we believe to probably be moral patients, their cybersecurity will be much higher-stakes than ours. Being hacked would be like getting abducted, and possibly way worse.
Despite “cybersecurity” being a frequent keyword in many recent communications on future AI developments, expertise at the intersection of cybersecurity and AI is extremely scarce. This is in part due to a cultural difference between the two fields, or as a rough summary, cybersecurity people not believing in AI (they generally do believe in past progress, but not in more than incremental future progress).
While largely technological, how well things will go regarding AI and cybersecurity can also be significantly affected by regulation (as always, in net-good or net-bad ways). Banks made money from credit card fraud before the 1974 Fair Credit Billing Act switched the cost of fraud from consumers to themselves, Google made money from advertisements of fraudulent online pharmacies, before being fined $500 million for this in 2011… Remember that by default, AI companies will financially benefit from fraudulent usage of their API, and there are other incentives to align.
Cybersecurity’s approach to emerging risks is to first let them become a significant problem, before doing something about it. In the context of very rapid progress toward human-level AI and beyond, this approach seems particularly inadequate.