Target audience: cybersecurity people, and AI / AI safety people.

I’ve been exposed to the cybersecurity culture and to the AI safety culture in different contexts. Both fields are concerned about mitigating risks from some technology (computers, or AI), yet they have quite different approaches in practice. Why?

Roughly speaking, the field of cybersecurity has emerged from people realizing they could break into things by making computers behave in unintended ways. The field of AI safety was born from the early 21st century rationalists: people trying to develop an accurate model of the world with predictive power. They looked at trends such as Moore’s law, total computing power in the world, or world GDP, to realize that human-level AI was likely to be developed this century, and that this would be one of the most transformative events on humanity’s trajectory.

Different immune systems, different blind spots. Every successful field develops an immune system to protect itself from the constant pressure of grifters and more generally, people in the wrong. Cybersecurity’s immune system can be well summed up by its phrase “POC || GTFO”. This stands for “Proof of Concept or Get the F*** Out”, meaning that if you can’t write a working exploit for the problem you’re describing, we don’t want to hear you talk about it. You will not give a talk at a cybersecurity conference about your extrapolation of trend lines; you will mostly give a talk about how you’ve broken into something. This is a defense mechanism to keep the cybersecurity industry in touch with reality, leveraging the fast feedback loop of computers. Modern science is based on a similar immune system: your theories need to make concrete predictions that can be tested through experiments. But when the topic of interest is the future, there is no such feedback loop that we can build upon. In practice, for short-term predictions, rationalists like to make bets with each other or trade on prediction markets. When forecasting the effects of human-level AI though, the only epistemic defense mechanisms left are a set of foundational texts basically outlining how to think in a careful and honest way, some associated norms of discussion, and the karma system on the rationalist forums. In other words, a social process focused on the quality of arguments.

As a result of these different immune systems, the cybersecurity culture is much more experimental and grounded in reality. Many ideas in the AI safety literature are just way too speculative. On the other hand, cybersecurity’s immune system has grown into an allergy to thinking about future risks1. This is an issue, because this results in a fundamentally reactive attitude toward the future, and therefore suboptimal results. Lacking this culture of forecasting is a significant blind spot. My general impression is that cybersecurity people are not taking into account the incredibly quick recent AI progress and using it as their baseline assumption for the future. Instead, their default assumption is that AI progress is stopping today. This might be an over-reaction to the current levels of AI hype. But I would like to tell them that it doesn’t matter if current AI capabilities are over-hyped. Taking a step back, AGI will be the most consequential technology when it ends up being developed, and the past few years have shown that it could happen relatively soon (to be fair, this isn’t obvious without some background knowledge, which I won’t be covering here). Making AI go well will require a lot of cybersecurity expertise, and this will be difficult if cybersecurity people don’t believe in AGI until they see it. I’ve written about some specific needs, such as confidential computing, trusted ML supply chains, or defense mechanisms against deep fakes, in my post Cybersecurity in AI: where progress is needed.

AI safety (and the AI industry more generally) could also learn from cybersecurity. AI agents will end up being deployed in many different contexts, and thinking purely in terms of alignment won’t cut it. AI agents should follow principles of least privilege, and probably eventually a similar principle of least capability (restricting some specific capabilities with negative externalities, such as offensive hacking, to e.g. actors who go through some authorization journey − more). Because we can’t trust any single defense mechanism under adversarial conditions, defense in depth should be the default attitude for AI deployment, where alignment is only one of the layers. The AI control research agenda seems particularly underinvested in, especially given that large tech companies currently bankrolling AI progress (Google, Microsoft, Meta) should be well-placed to investigate and implement this.


The above post represents my current thinking, but I would love to hear different takes on this! Let me know your thoughts. The laziest way is to do so anonymously here, or there are other contact options here.


  1. with the exception of cryptography, where people have been working on post-quantum algorithms for two decades