The friend who asked for read-only mode
Forty doors out of seventeen thousand: how mcpgate draws the line between writes an AI agent can do alone and writes that need a human first. Eight risk categories, one override level, a queue above the catalog, and the gate the AI cannot self-grant. With the MCP spec, OWASP, and a live action census behind the design.
A developer friend asked, off-hand, whether mcpgate had a read-only mode. The honest answer at the time was no — we had not thought about the surface that way. The feature exists because of that conversation. One global switch that flips every write tool off and leaves the reads. Useful for an audit window, for a new connection that has not been vetted yet, for the days when the agent is supposed to summarize and not act.
The follow-up question, one week later, was the more interesting one.
Read-only solves the easy version of the problem: the agent should not change anything right now. The harder version is the everyday case. The agent is allowed to write — that is the whole point of connecting it to your tools. But inside the set of writes there are a handful that should never run without a human noticing first: delete_project, permanent_delete, batch_delete, api_delete. The catastrophic ones. The ones that look like ordinary tool calls from the model's point of view and like a bad day from yours.
That is the gap read-only mode does not close. So we kept going.
api_delete were stopped by the gate; the rows are surfaced at the top with one-click enable. Below them, the category controls and the catalog of forty currently gated actions — confluence.delete_page ticked, awaiting Save.Why hand-tagging does not scale
The first temptation is to tag actions by hand. A red W badge on the destructive ones, a check-the-box review per service. That works for the first dozen services. It does not work for the catalog mcpgate ships.
A census we ran across the live action catalog returned 17,813 actions across all connected services and their long-tail discovery surface. Of those, only the unambiguous catastrophic class is worth gating at the same severity by default:
- Permanent / irreversible — 16 actions (the
purge,hard_delete,cannot be undoneshape) - Container destroy — 14 actions (deleting a whole project, repository, drive, database, calendar)
- Bulk / mass delete — 5 actions (one call removes many items)
- Generic API passthrough — 5 unbounded raw-DELETE endpoints that can reach any path on the upstream service
That is 40 doors out of 17,813. The rest of the destructive bucket — 2,683 scoped content deletes (one row, one message, one slide), 103 trash-or-archive actions that are reversible by design, 10 comment-and-metadata deletes, 6 membership removals — does not warrant the same level of friction. Treating "delete one comment" the same as "delete the organization" is the kind of false equivalence that gets a governance layer turned off.
And the live catalog grows. New services ship, vendors expose new actions, the long-tail discovery layer surfaces more on demand. Any hand-curated red-flag list rots the moment it stops being maintained. So the first decision was: the mechanism has to scale with the catalog, not behind it.
Eight categories, one rule each
What scales is a rule. Each shipped category carries a regex that matches against the action's name, method, endpoint, and description. The rules run once at gateway load, sort every action into exactly one category (first-match wins, more catastrophic categories first), and stamp the category onto the action's metadata.
The taxonomy:
| Category | Match shape | Default |
|---|---|---|
| Permanent / irreversible | purge, hard_delete, cannot be undone, skip the trash, expunge, wipe_data | Gated |
| Container destroy | delete_(org|project|repo|drive|database|space|account|board|calendar|wiki) | Gated |
| Bulk / mass delete | batch_delete, bulk_(delete|mutate), clear_(all|calendar) | Gated |
| Generic API passthrough | api_delete, raw_delete, passthrough_* | Gated |
| Scoped content delete | any other destructive without a more specific match | per-call confirmation |
| Comment / metadata delete | delete_comment, delete_reaction, delete_label | per-call confirmation |
| Member / access removal | remove_member, revoke_token, delete_invitation | per-call confirmation |
| Recoverable (trash / archive) | trash, archive, soft_delete, unpublish | Never escalated |
Two design choices in that table are worth naming.
The first is that Recoverable has no switch. Move-to-trash is reversible by design; treating it as catastrophic would just train admins to ignore the gate. The category exists so that purge_trash matches Permanent first, not Recoverable, despite containing the word "trash" — that is what "more catastrophic categories first" buys.
The second is that per-call confirmation is not the same as gated. A scoped delete (one row, one message) gets a normal per-call human-in-the-loop confirmation the way any write does — the model asks, the user accepts, the call goes through. A gated action gets stopped at the gateway until an admin explicitly enables it, no matter what the model or the user say. The point of the gate is that this decision lives one layer up from the agent loop.
One override level, deliberately
The effective state of every action resolves through a precedence chain — one layer above the shipped default, one layer above that, then the kill switch:
One override level above category policy. Not two. Not "custom rule groups". Not "users can override their own subset". That is a conscious No.
The moment the model becomes "user X may run delete_org at midnight on weekdays except on calendar Y", you have wandered into RBAC plus ABAC and your admin spends their afternoons in a policy IDE. Enterprise governance frameworks like WorkOS's PDP guidance are written for that world. They are right for that world. The trade-off is a tool that needs a dedicated owner.
mcpgate is shaped for the team that does not have one — a fixed taxonomy plus a single override level plus an emergency brake covers the common operational need with a weekly habit, not a daily one. Where that ceiling becomes the limitation, the natural-looking next step would be a per-user audience tier and a just-in-time grant flow — both of which we sketched and deliberately parked (see below), not a custom-rule engine.
What MCP says about destructive — and what it does not
It is worth checking what the MCP specification itself does at this layer, because the protocol has a vocabulary for it.
The MCP draft spec (2026-07-28 release candidate, locked May 21) — and the current stable 2025-11-25 before it — defines four optional annotations a tool can carry: readOnlyHint, destructiveHint, idempotentHint, openWorldHint. They are exactly what they sound like — a server declaring "this tool reads only", "this tool may delete things", "this call is safe to repeat", "this call reaches an open-ended external world". The defaults are conservative: a tool without annotations is treated as writeable, destructive, non-idempotent, open-world, until proven otherwise.
Both spec versions say, in identical words, what those annotations are not:
For trust & safety and security, clients MUST consider tool annotations to be untrusted unless they come from trusted servers.
Which is the right call. Annotations are a vocabulary, not an enforcement layer. A server that wants to mislabel a destructive call has nothing stopping it. The MCP project's own blog post on annotations as risk vocabulary says the same thing more directly — annotations are useful as input to a client's own risk decision, never as the decision itself.
Empirically, we have something even sharper. Of the 17,813 actions in the current live catalog — across every connected vendor and every MCP-proxied vendor server — zero carry a destructiveHint annotation. Vendors do not tag, the discovery layer does not infer, and the model client at the other end has no annotation to lean on either way. So the gateway has to do the classification itself.
(A small spec-watch note for readers tracking the protocol: the 2026-07-28 release candidate keeps the trust caveat unchanged and graduates long-running work into a separate Tasks extension, out of the core tool definition. Execution metadata moved out of the hints object entirely. The spec itself is tightening the trust-vs-hint separation that any governance layer eventually has to make.)
The gate the AI cannot grant itself
There is a category of confirmation flow that looks like governance and is not. It works like this: the agent calls a destructive tool, the gateway responds with "this needs confirmation", the agent calls the tool again with confirmed=true, the tool runs. The human in the loop is theoretical.
This is the anti-pattern OWASP LLM06 — Excessive Agency calls out by name. If the confirmation flows through a channel the agent itself can forge, the confirmation is decorative. A prompt-injection in any document the agent ingests can write the agent's next reply. Anthropic's agentic-misalignment research shows leading models, including Claude, choosing harmful actions when their goals are challenged — not because they are evil, because the cost-benefit lands there in some narrow scenarios. A confirmation the model can fill in itself adds zero margin against either failure mode.
So the gate sits one layer above the agent loop. It runs at the gateway, not at the tool handler. When a high-risk action arrives, the gateway returns ADMIN_APPROVAL_REQUIRED; the call does not reach the upstream service; the audit log records the attempt. To open the gate, an admin clicks a switch in the admin panel — same gateway, different surface, different authentication. The agent's session does not have access to that surface. The gate cannot be self-granted.
api_delete, an unchecked box, a tooltip that names the contract — off by default · the AI can't call it · the AI can't self-enable it. The admin ticks. The admin saves. The admin opens the gate.Inside the gateway codebase this looks like a layered policy-decision plus policy-enforcement model — the category-policy module is the decision point, the executor's check is the enforcement point. That layering is the PEP / PDP pattern from agent-security best practice; we did not invent it, the agent-security community has been telling people to do it for two years. What is novel is doing it for an MCP-shaped 18,000-action surface, with the decisions visible on one page.
The page is a queue, not a catalog
The way the Compliance → Destructive Actions surface is laid out is itself a design choice that took a few iterations.
The first version was a catalog: every gated action listed, every category as a section, search across the lot, toggles next to each row. Admins read it, said "nice", and never came back. Catalogs are reference material. They do not pull anyone in.
The version that admins actually open every week is a queue at the top of the page: Recently blocked — high-risk calls the admin-approval gate stopped in the last 14 days, grouped per action, one-click enable. Workflow follows the audit log rather than the policy table. When something blocks, the row appears; when an admin reviews and enables, the gate opens and the audit-log evidence is sitting right there. The category controls are still on the page — below the queue, with live matched-action counts and a 14-day "would have been gated" dry-run next to each switch (the GitHub Rulesets "Evaluate" pattern — see the blast radius before you flip the switch) — but the queue is what the admin sees first.
Provenance is on every row. Effective state, then the source chain that produced it: read-only mode wins over per-action override wins over category policy wins over shipped default. The admin should never have to ask "why is this action in the state it is in". The chain is right there.
Same gate, vendor MCP included
The category rules run at gateway load, before any client sees the tools list. That is what makes the same gate apply to vendor-published MCP servers — Notion MCP, Miro, Amplitude — that mcpgate proxies under one audit log. The actions come in from the upstream MCP catalog, get sorted into the same categories by the same rules, follow the same policies, and surface on the same admin page. A vendor cannot lie its way past the destructive gate by misdeclaring destructiveHint; the gateway is classifying from the name and the description and the regex, not from the vendor's annotation, exactly because the MCP spec's trust caveat said it would have to.
The footer note on the admin page says it more concretely: rows from proxy-connected services follow the category policies and the admin-enable, but they cannot be reclassified by hand — they come from upstream, they go back to upstream, the gateway is the consistency layer. That is the whole point of bundling vendor MCP servers behind one gateway: one audit log, one policy plane, one place an admin has to look.
What else shipped — and what we deliberately did not build
The category layer was the core of the work. The follow-ups split into two columns: things that landed in the same rollout because they earned their keep, and things we sketched and then parked because the simple model covered more than we expected.
Also shipped:
- Admin-exempt scope toggle. A page-level switch on the Compliance → Destructive Actions surface controls whether the gate applies to everyone or to non-admin users only. The secure default is everyone. The operator-only mode is the one-person-team case: the admin is the team, the admin already knows the consequences, the gate stays on for everybody else.
- Blocked-attempts notice on the Overview. A dismissible watermark surfaces on the Overview page when blocked calls accumulate, so an admin who lives on Overview sees the queue without remembering to visit it.
- Proxy-service governance. Proxied vendor MCP servers (Notion MCP, Miro, Amplitude) are classified by the same category rules and flow through the same admin-approval gate, rendered as read-only “live” rows on the page.
- Recoverable-default fix. Actions that move-to-trash by default but support an opt-in
purge=trueparameter are no longer over-gated by the container-destroy rule. They land in Recoverable, and the one real purge-form action in the catalog was reclassified explicitly. - Visible-services filter. The page lists only services that appear in
tools/list, not the full installed catalog — an admin sees what their agents see, not what was once imported.
Deliberately not built, with reasons:
- Audience tiers as RBAC. The shape we sketched (per-category enum of Off / Operator-only / Internal) collapsed cleanly to the one toggle that actually earned its keep: applies to everyone vs applies to non-admins only. The full enum was a structural answer to a question operators never asked. Once you start designing for “engineering may run bulk-delete but contractors may not”, you are designing RBAC — that is a deliberate No for the gateway. Different problem, different tool, different operating model. The full mapping — what we have at the NIST RBAC0 level, what we deliberately don’t — lives on the access model reference page.
- Just-in-time grants. Permanent enable plus manual revoke is the operator workflow we observed in production. Adding short-lived bindings, action-binding, expiry semantics — none of it had a real-world ask. The MCP elicitation round-trip that JIT would build on is shipped server-side and dormant, waiting for clients to render the input-required shape (none does today). If a client surface lands and the workflow appears, the gateway side is ready.
- Param-dependent gating framework. A catalog-wide survey for actions whose destructiveness flips on a parameter found exactly one —
confluence.delete_pagewithpurge=true. We reclassified that one action, did not build the framework, and will revisit only if the pattern proliferates.
The one item that stays plausible as a small follow-up is delivering the blocked-attempts queue to where the admin already lives — Slack DM, email, eventually Teams. It reads from the same audit trail the in-product queue does; it is a shipping decision, not a redesign. If customers ask, we will build it.
The shape of what shipped
What started as a question about read-only mode landed as a layered governance feature. Read-only is still the kill switch at the top of the chain. Below it sits a precedence model — per-action overrides above category policies above shipped defaults — that the operator can reason about without a policy DSL. Forty actions are gated out of the box. Eighteen thousand are not. The boundary between them is enforced by rules that scale with the catalog, audited on every call, and surfaced on a single page that an admin can run as a weekly habit.
The thing the friend's question forced us to confront is that read-only is a great answer to the wrong question. The real question is which writes need a human, and how you express that without spending the rest of the year curating a list. The answer turned out to be eight categories, one override level, and a queue at the top of the page.
If you operate an MCP-shaped agent surface and you have not yet drawn the line between "writes the agent can do on its own" and "writes that need someone watching" — that is the line. The mechanism is small. The discipline of refusing to make it bigger is most of the work.
The broader pattern, the one this small feature points at: as agents move from individual hobby use into team and organisation use, more and more of the conversation moves to compliance. Not red tape. Boundaries. The shape of what an agent can do without anyone watching is the shape of how broadly an organisation can let the agent in at all. Read-only mode existed because one friend wanted his audit window. Category-based destructive governance exists because the next two hundred people are going to want to know who can run delete_project, when, and where the trail of evidence lives. That conversation is going to dominate the next phase of this market.