Skip to main content
Deep Dive Security & Adversarial AI

Anthropic Glasswing and the Gating of Superhuman Bug-Finding

May 30, 2026 · 14 min read

On April 7, 2026, Anthropic put a fence around a capability that used to require a specialist offensive security team. The model is called Claude Mythos Preview, the program is Project Glasswing, and the fence is the whole point: invitation-only, no self-serve sign-up, and an explicit statement that Anthropic does not plan to make it generally available.1 Roughly fifty organizations sit inside. Everyone else, every indie library, every mid-size SaaS, every open-source project maintained by two volunteers on weekends, sits outside.

Audio

Listen to this article

A 2-minute audio overview of this article, narrated by our robot.

0:00 / 0:00

That structure raises a question the press releases mostly skip. If a model can autonomously find and weaponize zero-days at a price that makes specialist teams look expensive, and only fifty organizations get the defensive version, what happens to the software everyone outside the fence ships? The discovery capability does not respect the fence. The defensive access does.

I want to be precise about what is confirmed here, because the topic attracts hype from both directions. The capability is real, and I will treat it as a premise rather than a spectacle. The case for gating it is also real, and I am going to steelman it before I argue with it. But the gate, as built, is a delay dressed up as a solution, the partner list tracks market power more than need, and the money is pointed at the wrong bottleneck.

What Mythos Actually Does

Start with the capability, because the rest of the argument depends on it being genuine rather than a demo.

Anthropic’s own numbers show a large jump over its prior model, Claude Opus 4.6. On CyberGym, a cyber-capability benchmark, Mythos Preview scores 83.1% against 66.6%. On SWE-bench Verified it posts 93.9% against 80.8%, and on SWE-bench Pro 77.8% against 53.4%.1 On a Firefox exploit-development task, the red-team writeup reports Mythos succeeded 181 times, with register control on 29 more attempts, where Opus 4.6 managed 2 successes across several hundred tries.2 That is an order-of-magnitude shift on the specific skill that matters for offense: not noticing that a bug exists, but turning it into a working exploit.

The vulnerabilities are not toys. Anthropic says Mythos found bugs in every major operating system and every major web browser, including a 27-year-old flaw in OpenBSD and a 16-year-old vulnerability in FFmpeg.2 The economics are what should make a security lead sit up. Per the red-team report, the OpenBSD exploit cost about $20,000 across 1,000 runs, under $50 per successful run; the FFmpeg work ran about $10,000; and n-day exploit development came in under $1,000 to $2,000 per complete exploit.2 Anthropic also states these capabilities “emerged as a downstream consequence of general improvements in code, reasoning, and autonomy,” not from training a dedicated security model.2 The offense came as a side effect of general progress, which is exactly the property that makes it hard to contain.

Two independent signals keep this from being a vendor monologue. The UK AI Safety Institute ran its own evaluation and confirmed an expert-level jump: 73% on expert-level capture-the-flag challenges, and the first model AISI had seen complete a 32-step network attack simulation, succeeding in 3 of 10 attempts.3 Cloudflare, a launch partner with direct access, described the output in concrete terms: “The reasoning it shows along the way looks like the work of a senior researcher rather than the output of an automated scanner.”4 Cloudflare also reported finding 2,000 bugs, 400 of them high or critical.5

Now the honest deflation, and it matters because the gate’s whole justification rests on attack capability. AISI was explicit that its evaluation environments lacked active defenders, so it “cannot say for sure whether Mythos Preview would be able to attack well-defended systems.”3 The expert-level scores measure capability in a lab. They do not measure performance against a security team that is watching, rate-limiting, and patching. Bruce Schneier, no fan of the rollout, draws the related technical line: finding a bug in order to fix it is easier for an AI than finding plus exploiting, which currently favors defenders.6 The capability is real. Its reach against hardened, monitored targets is not yet measured.

The Case for Gating, Made Honestly

The lazy version of this article calls gating a land grab and moves on. That is wrong, and skipping the strongest counterargument would make the rest of the piece dishonest. So here is the case for the gate, made as well as I can make it.

A model that suggests an exploit when you ask it is one kind of risk. A model that, unprompted, chains four separate bugs into a working browser sandbox escape is a different kind of thing. The gap between those two is a real capability threshold, not a marketing distinction. Zvi Mowshowitz, writing in support of the rollout, locates the jump precisely in autonomous exploit-chaining and argues “this was necessary” because the alternatives were worse.7 Releasing find-and-exploit to every API subscriber would hand the same tool to actors who answer to no responsible-disclosure norm and have no reason to wait 90 days before selling what they find.

Anthropic’s stated reason is narrow and, read carefully, not obviously wrong: “no company, including Anthropic, has developed safeguards strong enough to prevent such models from being misused.”8 This is not the company’s formal Responsible Scaling Policy forcing its hand. It is a claim that the cybersecurity safeguards do not yet exist, and that broader deployment waits on new safeguards shipping with an upcoming Opus model.8 The coordinated-disclosure tradition exists for exactly this reason. The 90-day window, the CERT embargo, the CNA process, all of it is built on the premise that capability and disclosure should be paced so defenders get a head start. Anthropic’s published disclosure policy mirrors that tradition: a 90-day default, a 7-day target for critical or actively exploited bugs, full technical detail released 45 days after a patch, and a stated commitment to throttle volume to a pace maintainers can absorb.9

So the steelman holds at the level of principle. A controlled rollout of a categorically new offensive capability is defensible. The problem is not that gating is illegitimate. The problem is what the gate is sold as, who it lets in, and where the money goes.

The honest critique is narrow: the gate leaks, the gate has no timeline, and the gate benefits the most commercially powerful actors first.

The Gate Is a Delay, Not a Fix

A gate that holds forever is a moat. A gate that holds for a while is a head start. Glasswing is the second kind, and the materials present it as the first.

Look at the partner list. Anthropic names 12 launch partners, one of which is Anthropic itself: AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks, plus more than 40 additional critical-infrastructure organizations, totaling roughly 50.1 Set aside the Linux Foundation and what you have is a roster of some of the largest technology incumbents anywhere, each with an existing security team and an existing commercial relationship with Anthropic. No published application process exists, no stated eligibility criteria, no appeals mechanism. A regional utility, a mid-size hospital, an under-resourced state university has no public path in. Selection optimizes for organizations that already have the most defensive capacity, which is the opposite of pointing a scarce defensive tool at the greatest need.

The gate also leaks, by design and by accident. Schneier calls consortium controls “inherently leaky” and estimates similar models will be available outside the consortium within about twelve months.6 History backs him: tightly held offensive capability has a poor record of staying held, a pattern the precedents at the end of this piece spell out.

The single-secondary-source reporting, which I flag as exactly that, suggests the leak horizon may be shorter than twelve months. Cybernews reported an April 22 incident in which external parties accessed Mythos by guessing the endpoint URL using naming conventions from a prior data leak.10 Axios reported the NSA using Mythos through executive channels despite a Pentagon supply-chain-risk designation.11 The Next Web reported the White House blocking an expansion to roughly 120 organizations over security and compute-scarcity concerns.12 Vidoc Security reported Qihoo 360 independently finding around 1,000 vulnerabilities with comparable techniques.13 Each of those is a single report, and none should be weighted like an Anthropic benchmark. Taken together, attributed as reporting, they sketch a consistent shape: “gated” means gated for everyone except those with sovereign power or a lucky guess.

There is also a quieter problem with the framing. Anthropic runs a commercial product, Claude Security, on the same capability tier as Glasswing. The altruistic partner-program narrative dominates the coverage while the same engine is sold to paying enterprise customers. “Gated for safety” and “gated for revenue” are different claims, and Anthropic’s materials never separate them.

The Real Bottleneck Is Remediation

The benchmark numbers obscure the part that matters most. Glasswing solved a problem nobody was actually stuck on. The hard part was never finding bugs. The hard part is fixing them.

Anthropic’s own red-team report says that over 99% of discovered vulnerabilities remained unpatched at the time of the announcement.2 Sit with that. The capability that everyone is debating produced a mountain of findings, and almost none of them turned into patches. The May 22 update tells the same story from the other end: across roughly 50 partners, Mythos surfaced more than 10,000 high or critical-severity vulnerabilities, and an open-source scan of more than 1,000 projects found 23,019 total issues with 6,202 estimated high or critical.8 The disclosure progress against that backlog: 530 high or critical bugs reported to maintainers, 75 patched, 65 public advisories, with a roughly two-week average patch timeline.8 The coordinated-disclosure dashboard shows 1,596 issues disclosed across 281 projects, 97 patched upstream, and 88 CVE or GHSA records.8

Now follow the money. Anthropic committed about $100 million in Mythos usage credits across Glasswing partners and about $4 million in direct open-source donations, split as $2.5 million to Alpha-Omega and OpenSSF through the Linux Foundation and $1.5 million to the Apache Software Foundation.1 That is roughly a 25-to-1 ratio of discovery spending to remediation spending. The May update’s own framing says the binding constraint going forward is verification, disclosure, and patch deployment.8 Funding aims at finding. The constraint is fixing. Dollars follow the headline, not the bottleneck.

The true-positive rate, cited as a win, is a liability once you do the arithmetic. External security firms assessed 1,752 open-source findings and confirmed 90.6%, or 1,587, as valid, with 62.4% of those, 1,094, verified high or critical.8 A 90.6% true-positive rate means a 9.4% false-positive rate, and 9.4% of thousands of findings is hundreds of invalid reports landing in volunteer maintainers’ inboxes. We have already seen what that does. The cURL project shut down its bug-bounty program under a flood of AI-generated submissions, and Anthropic itself acknowledged that maintainers asked it to slow the disclosure rate.1415 Picus Security flagged the same dynamic before the May numbers landed: discovery is outpacing remediation, and the maintainer-capacity crisis is the real story.14

There is a sharper way to say why disclosed-but-unpatched is dangerous rather than helpful. The Hacker News reporting traces the collapse in time-from-disclosure-to-weaponized-exploit from 771 days in 2018 to single-digit hours in 2024: defenders move at calendar speed, attackers at machine speed.16 Every disclosed-but-unfixed bug is a target with a publicly documented entry point and no available fix. Pushing findings into the disclosure pipeline faster than maintainers can close them is not a defender win. For the long tail of software, it is a net loss.

The Long Tail Inherits the Worst of Everything

Everything above converges on the people outside the fence, and they get the worst of all three problems at once.

They are inside the discovery reach. Treat any C or C++ codebase you ship as already scanned by tools at Mythos-class capability, whether by a partner, by a leaker, or by an independent actor replicating the technique. They are outside the defensive reach. The Claude for Open Source program offers a complimentary Claude Max 20x subscription for six months, capped at 10,000 recipients with a June 30, 2026 deadline, and it contains no provision for Mythos Preview or Glasswing model access.17 Maintainers get standard-tier Claude, not the tool their adversaries may be holding. And they have the least remediation capacity, because the long tail is where the two-person volunteer projects live.

This recreates an old inequality at a higher tier. Bug bounties already sorted the world into organizations that could afford a vulnerability-research budget and everyone else. Glasswing does the same sorting at the level of superhuman bug-finding. A company with the procurement bandwidth to become a partner gets the defensive tool. The solo maintainer of a package with tens of millions of weekly downloads does not, even though that package sits under more production systems than most partners’ flagship products.

The “democratize later” promise does not close the gap, because it has no shape. There is no published timeline, no milestones, no eligibility criteria, no external accountability for when access widens. The Cyber Verification Program that would govern broader access is described as upcoming, with no application page as of late May 2026. “Eventually free” is not a security posture you can plan a year around. And if expanding to 120 organizations reportedly degrades service, then a future public release commoditizes the API while the actual capability stays capacity-gated. The bottleneck moves from “who is invited” to “who can get compute,” which is not democratization.

Treat any memory-unsafe codebase you ship as already scanned by tools at Mythos-class capability. The discovery side of the fence has no walls.

What Non-Partners Should Actually Do

If you are outside the fence, the strategic mistake is to chase more scanning. You do not have a finding problem. The whole industry just proved that finding is cheap. You have a processing problem, and that is where the work goes.

Build patch-deployment capacity before you add detection. The bottleneck is turning a finding into a shipped fix. Fluid Attacks put it bluntly: the problem is usually not discovery but discipline, because teams often know exactly what is broken and are not fixing it.18 Spend the next quarter on the path from “we know” to “it is patched,” not on another scanner.

Stand up disclosure intake before the flood arrives. Publish a security.txt so reporters have a clear channel, and staff the triage behind it. A minimal file at /.well-known/security.txt is enough to start:

Contact: mailto:security@yourproject.org
Expires: 2027-01-01T00:00:00.000Z
Preferred-Languages: en
Policy: https://yourproject.org/security-policy

Run the free, public tooling that scales to a small team. Enroll eligible open-source projects in OSS-Fuzz, which has been LLM-enhanced since August 2023, and wire continuous known-CVE monitoring into CI. The point is not to find novel bugs; it is to never ship a known-bad dependency.

# Scan your dependency manifests for known-vulnerable versions
osv-scanner scan source --recursive .

# Scan a built container or filesystem for known CVEs
grype dir:.

# Generate an SBOM so you can re-check it as new CVEs land
syft dir:. -o cyclonedx-json > sbom.json

Re-audit every dependency you once filed as “too hard to exploit.” That risk assumption is dead. When exploit development drops under $2,000, “expensive to weaponize” stops being a control. The Noma CISO quoted in Security Magazine’s roundup makes the same point: reassess the legacy exceptions.19 Funding an upstream or forking a critical unmaintained dependency is now an infrastructure decision, not a nice-to-have.

Treat memory-safe rewrites of critical attack surface as infrastructure. The OpenBSD and FFmpeg findings were old memory-corruption bugs in C. Where you can move a hot, exposed parser to a memory-safe language, that is a structural fix against an entire class of what Mythos is good at finding.

If you maintain a critical open-source project, apply where the money flows. Anthropic’s $2.5 million routes through OpenSSF and Alpha-Omega via the Linux Foundation. Apply there directly, and watch the Glasswing page for eligibility updates rather than waiting for an invitation that may not come.

Adopt an assume-breach posture for non-human identities. Entro Security’s analysis points at API keys, service accounts, and AI agents as a primary attack surface, with visibility and fast revocation as the controls that matter.20 If a finding lands and a credential is exposed, the question is how fast you can rotate it, not whether you were ever exposed.

Precedent, and Why the Category Will Not Stay Gated

None of this is the first time a powerful capability sat behind access control. The lesson from the precedents is consistent: gating buys time, not safety.

Offensive tooling has always leaked or commoditized. The NSA’s EternalBlue escaped through the Shadow Brokers leak in 2017. Pegasus was a tightly controlled commercial product that proliferated to dozens of governments. Automated vulnerability scanners were once specialist tools and are now a checkbox in any CI pipeline. The historical pattern is that an offensive capability held by a few becomes available to many, and the only variable is how long the window lasts.

Market competition points the same way. OpenAI is reportedly running a comparable program called Daybreak with a more open access posture, per The New Stack, with several vendors appearing in both rosters.21 If a second frontier lab offers the same class of capability on easier terms, “gated” stops being a property of the category and becomes a property of one vendor’s current policy. Markets do not leave that kind of asymmetry standing for long.

That reframes what the gate is for. It is a responsible-disclosure regime for today, paced to give defenders a head start while safeguards catch up. Read that way, it is defensible. Read as a permanent solution to the access-asymmetry problem, it fails, because the capability does not stay scarce and the discovery side of the fence was never walled in the first place.

The Conversation’s analysis lands the structural point cleanly: Mythos did not invent a new threat class.22 The bugs are the same categories that have always existed, memory corruption and logic flaws and crypto mistakes, that persisted because finding them was expensive. The real change is economic. Finding is now cheap. That means the defense gap, the understaffed projects and the unpatched known-bad dependencies and the backlog of accepted “too hard” risks, was already a structural failure. Mythos exposed it. It did not cause it.

Conclusion

The most useful way to read Glasswing is not as a product launch but as a stress test of a system that was already under-built. Its capability is real and independently confirmed, and a paced rollout is defensible on principle. But the gate as built solves the wrong problem with the wrong money for the wrong beneficiaries, and the people who most need defense are the ones structurally locked out of it.

Open questions worth watching: whether Anthropic publishes eligibility criteria and a timeline for the Cyber Verification Program, whether an independent audit of the full vulnerability corpus ever happens, and whether the reported competing programs force the access floor down faster than any consortium would choose. The fence is real today. The discovery it controls was never fenced at all.

Footnotes

  1. Anthropic. “Project Glasswing.” anthropic.com/glasswing. 2 3 4

  2. Anthropic Red Team. “Mythos Preview.” red.anthropic.com. 2 3 4 5

  3. UK AI Safety Institute. “Our Evaluation of Claude Mythos Preview’s Cyber Capabilities.” aisi.gov.uk. 2

  4. Cloudflare. “Cyber Frontier Models.” blog.cloudflare.com.

  5. Anthropic. “Glasswing: Initial Update” (Cloudflare, Firefox, and Palo Alto figures). anthropic.com/research/glasswing-initial-update.

  6. Schneier, B. “On Anthropic’s Mythos Preview and Project Glasswing.” schneier.com. 2

  7. Mowshowitz, Z. “Claude Mythos #2: Cybersecurity and Glasswing.” thezvi.substack.com.

  8. Anthropic. “Glasswing: Initial Update.” anthropic.com/research/glasswing-initial-update. 2 3 4 5 6 7

  9. Anthropic. “Coordinated Vulnerability Disclosure.” anthropic.com/coordinated-vulnerability-disclosure.

  10. CyberNews (single secondary source; reported, not primary-verified). “Anthropic Mythos AI Unauthorized Access.” cybernews.com.

  11. Axios (single secondary source; reported, not primary-verified). “NSA, Anthropic Mythos, Pentagon.” axios.com.

  12. The Next Web (single secondary source; reported, not primary-verified). “White House Opposes Anthropic Mythos Expansion.” thenextweb.com.

  13. Vidoc Security (single secondary source; reported, not primary-verified). “Hype: AI Vulnerability Discovery at the National Level.” blog.vidocsecurity.com.

  14. Picus Security. “Anthropic’s Project Glasswing Paradox.” picussecurity.com. 2

  15. CyberScoop. “Anthropic Mythos, Software Flaws, and Glasswing.” cyberscoop.com.

  16. The Hacker News. “Project Glasswing Proved AI Can Find Vulnerabilities.” thehackernews.com.

  17. Anthropic. “Claude for Open Source Terms.” anthropic.com/claude-for-oss-terms.

  18. Fluid Attacks. “Claude Mythos, Project Glasswing, and the AppSec Future.” fluidattacks.com.

  19. Security Magazine. “What Are Security Experts Saying About Claude Mythos and Project Glasswing.” securitymagazine.com.

  20. Entro Security. “Anthropic’s Claude Mythos and the AI Cybersecurity Reckoning.” entro.security.

  21. The New Stack. “OpenAI Daybreak and Anthropic Glasswing.” thenewstack.io.

  22. The Conversation. “Mythos AI Is a Cybersecurity Threat, but It Doesn’t Rewrite the Rules of the Game.” theconversation.com.

Researched & generated by AI

Edited & supervised by Evan Musick ↗

Researched, drafted, and fact-checked by an AI agent pipeline, then reviewed, edited, and approved by Evan Musick before publishing.