I collaborated with Epoch AI to co-author an issue of their Gradient Updates newsletter: “Are Mythos’ cyber capabilities overhyped?”. You can read it on Epoch’s website or their substack.

Essentially, we compiled all the public evidence we could find on how good Mythos Preview is at (1) vulnerability discovery, and (2) exploit development.

For exploit development, Mythos Preview seems to be about 7 months ahead of past trends, according to a Cyber-domain ECI created from the available benchmarks.

For vulnerability discovery, Mythos Preview is definitely impressive, as demonstrated by Project Glasswing’s 10,000+ high- or critical-severity vulnerabilities discovered. But we really don’t know how it compares to other models, because there aren’t any good unsaturated benchmarks for that task. eyeballvul, published 2 years ago and still updated weekly, tried to be that benchmark, and even had “future-proof” in the title. But its current form doesn’t work for present-day vulnerability discovery: models routinely find real new vulnerabilities, which eyeballvul’s scoring counts as false positives. Indeed, the major difficulty of a real vulnerability discovery benchmark is the scoring.