When a tool call takes minutes
Some MCP tool calls genuinely take minutes — a heavy SQL query, a large export, a design-file node dump. Held synchronously they bump into client and proxy timeouts; the work may complete but the answer is lost. mcpgate now speaks the upcoming MCP protocol revision and its Tasks extension: a long call returns a ticket the agent polls instead of holding the connection open. Today's clients are completely unaffected.
Most tool calls finish in a couple of seconds. The agent asks for the last seven Sentry errors, the gateway returns the seven Sentry errors, the conversation continues. That is the easy case, and it is the great majority of what an AI client actually does in a day.
Then there is the long tail. A heavy BI query that scans a year of warehouse rows. An export of a large Drive folder. A node dump of a wall-sized Figma file. These are not bugs — they are honest work that takes honest time. The agent issues a request, the gateway issues the upstream request, and someone has to hold the connection open until the answer comes back.
That is where the synchronous shape of the MCP protocol bumps into a ceiling. A few seconds is fine. Tens of seconds, sometimes. A minute, often not — client timeouts, proxy timeouts, idle-connection drops. The work may finish on the upstream service. The answer never reaches the agent.
The protocol is fixing it. The upcoming MCP revision adds a Tasks extension that turns a long-running call from "hold the line" into "take a ticket". mcpgate negotiates and serves that upcoming revision today — on request, ahead of the spec's finalization, with no impact on clients still on the current stable.
The coat-check ticket
The analogy is older than the internet. You walk into a restaurant in a heavy coat. You drop the coat at the coat-check. You get a small numbered card. You go and eat. When you are done, you trade the card for the coat.
What the coat-check buys you: you do not stand there holding the coat for two hours.
The MCP Tasks extension is the same shape. The agent issues a tool call. The server (here: mcpgate) looks at it and notices it will take a while. Instead of holding the connection open, the server returns a task handle — the ticket. The agent goes about its other business. When it wants the result, it polls the task with the handle. Running. Running. Done — here is the answer. The work happened in the background, the connection was never the bottleneck.
One subtlety that matters: the choice between an inline result and a task handle is the server's, not the agent's. The agent does not flag a call as long. It simply issues the call. The server decides per call: if it comes back fast, the answer is inline; if it runs past a configured budget, the server promotes the call to a task and returns the handle. The agent's job is to know how to read either shape — which the protocol formalises.
The deeper change: stateless-core
The 2026-07-28 revision is nicknamed the “stateless-core” revision in the spec’s own write-up, not by accident. The biggest change is not Tasks. The biggest change is that the protocol can no longer lean on the connection as a state container.
Pause on what the old shape was actually doing. An agent opens a connection. It issues a tool call. The gateway holds the connection open while the upstream work happens; the agent waits; the gateway hands the answer back over the same connection. The TCP socket, the HTTP keep-alive, the SSE stream — the connection itself was carrying state across the request: which call is this answer for, who initiated it, what client capabilities apply.
That worked when the work fit inside a single connection’s life. It does not scale on ordinary HTTP infrastructure — CDNs, load balancers, idle-connection killers — and it does not survive a horizontally-scaled server, where the next request may land on a different process than the one holding the connection’s state. The new revision says: stop. The protocol may not assume connection-bound state. Every request carries the context the server needs; every response is self-contained.
Once that constraint lands, long-running work cannot just “stay on the connection” until done. It needs an explicit, server-side state object the client can reference across as many separate requests as it likes. That object is a task. The handle — the coat-check ticket from the section above — is the only thread that ties “I started this work” to “give me its result”.
Tasks is not a feature added on top of the existing model. It is the consequence of a model that no longer permits the implicit version. The coat-check analogy holds because the protocol now demands a coat-check; the held-coat shape, synchronous and connection-bound, is what the revision was designed to retire.
For mcpgate this was a small lift, not a redesign. The gateway has been stateless since the start — no per-connection session, identity resolved per request, every response self-contained. The constraint the new revision lifts off the connection was a constraint mcpgate had never relied on. The work was implementing the explicit task object the spec now requires; the architecture around it was already where the revision asks the protocol to land.
This is not the only place the alignment shows up. The gateway’s hooks layer — policy hooks, enrichment hooks — already runs at the interceptor shape that any MCP-shaped tool gateway eventually needs. The elicitation round-trip the latest revision formalises is shipped server-side and dormant, waiting for clients to render the input-required shape. The pattern repeats: where the protocol is moving, the gateway tends to already be moving in the same direction — not by anticipating releases, but because the shapes that hold up on the wire tend to be the shapes the protocol eventually formalises.
Read together, that is not three coincidences but one architectural direction. The gateway is not retrofitting itself onto the spec; the spec is moving toward shapes the gateway has been quietly relying on. We read that as a useful signal that the design is on the right path — not a guarantee, but enough to keep building in the same direction.
What else the 2026-07-28 revision brings
Tasks is the change this post focuses on, but it ships alongside several others. The shortest survey:
- Stateless core — the protocol no longer leans on the connection as a state container; every request carries its own context, every response is self-contained. Scales on commodity HTTP infrastructure. (Covered in detail above.)
- Tasks extension — long-running work moves into its own extension with explicit handles, polling and cancellation. (The topic of this post; SEP-1686.)
- Tighter OAuth / OIDC alignment — explicit alignment with RFC 8707 audience validation and the OAuth / OIDC discovery shape, so client identity travels through the gateway with the boundaries the IdP set, not the boundaries the gateway invented.
- MCP Apps extension — servers can ship richer server-rendered UI surfaces, not only text and structured content. Opens the door to interactive tool surfaces inside the MCP client.
- Multi-round-trip requests — the protocol-level mechanism behind elicitation: a request can return
input_requiredwith an asked-for shape, the client answers, the request resumes. Formalises a round-trip that earlier shaped as ad-hoc agent-renders. - Formal deprecation policy — the spec can now retire fields and shapes through a documented deprecation path instead of breaking changes between revisions. Forward-stability for everyone who already shipped against an earlier version.
The full release-candidate write-up is the canonical source. mcpgate currently negotiates the stateless-core handshake and serves Tasks (the focus of this post); the other items land on the gateway as the spec finalises and as clients begin to use them.
The five protocol versions mcpgate now negotiates
MCP versions itself by date. When an agent connects, it sends the list of protocol revisions it understands. The server picks the newest one both sides can speak. mcpgate now negotiates five:
| Version | Status | When mcpgate serves it |
|---|---|---|
2024-11-05 | Stable, older | Always, on request |
2025-03-26 | Stable | Always, on request |
2025-06-18 | Stable | Always, on request |
2025-11-25 | Current stable | Default for clients that don’t request a draft |
2026-07-28 | Upcoming draft (RC locked May 21, finalising late July 2026) | Only when a client explicitly asks for it |
That last row is the new thing. The 2026-07-28 revision — nicknamed the “stateless-core” revision in the spec’s own write-up — is the largest revision of the MCP protocol since launch. It tightens authorization to align more closely with RFC 8707 audience validation, graduates long-running work into the Tasks extension we are using above, adds MCP Apps for server-rendered UIs, and introduces a formal deprecation policy so the protocol can evolve without breaking what already shipped. The full overview is in the MCP project’s own release-candidate write-up.
It is still a draft. Final publication is targeted for late July 2026. mcpgate serves it now — but only to a client that explicitly lists 2026-07-28 in its protocol-version preference.
The promotion budget is the question, not the mechanism
The mechanism is the easy part. The interesting design choice is: where do you draw the line between “inline” and “task”? If you set the budget too short, every normal call gets promoted to a task and the agent has to poll for results it could have read straight off the wire. If you set it too long, you keep right on holding long connections open through proxy timeouts.
We picked the boundary from production telemetry, not from intuition. The duration distribution is not a smooth slope — it is two clusters with a quiet middle. The shape was already implicit in the examples used in the lead (a heavy BI query, a large export, a wall-sized Figma file dump); the audit-log distribution confirms it.
The 95th percentile sits near three seconds. The 99th percentile sits near two minutes. That is a roughly 30× jump in the space between them — not a slope, a cliff. Between three and eight seconds, only about 2% of calls live there. That is the quiet middle.
Eight seconds sits inside the quiet middle: past where any normal call has any business being, well below where the long-runner cluster’s central mass is. At that budget, roughly 97% of calls stay inline and only the genuine long-runners promote to a task. The number is not magic — it is the natural valley in the shape of the data.
The budget is operator-tunable per instance, or can be disabled entirely if a deployment wants every call inline for whatever reason.
Ahead of the spec, safely
Implementing a draft in production carries an obvious risk: pushing it onto clients that did not ask for it. The piece that closes that risk is strict version negotiation.
A client that lists only 2025-11-25 or older never gets pushed onto the draft. The capability path is opt-in, not silent-upgrade. An older client connecting today behaves exactly as it did before this work shipped. A client that lists the draft can choose to use it; the gateway responds with the draft semantics, including the Tasks extension on supported calls. That separation is what makes “ship the draft early” a tolerable bet rather than a reckless one. An agent built against the upcoming protocol now has at least one server it can integration-test against; an agent built against today’s stable has nothing to worry about.
What this changes for agent builders
Three things, depending on where you are in your build:
If you are on today’s stable protocol: nothing changes. Your client requests 2025-11-25 or one of the older versions, mcpgate serves that, your calls are inline as they were, no Tasks involved. You can ignore this work entirely until your client speaks the new version.
If you are building against the upcoming spec: mcpgate is now one of the small set of servers you can develop and test against. The gateway already surfaces the upcoming version on discovery, protocol negotiation is strict, and the Tasks extension is wired end-to-end with test coverage of the polling round-trip. The integration is real, not a roadmap line.
If you operate a heavy-query gateway: the operational consequence is the one that matters most. Tool calls that the synchronous shape could not carry past a client timeout now have a path through. When a client that speaks the upcoming protocol calls a long-running tool, the gateway no longer needs to hold the connection. The slow-query class stops being a class that quietly fails — without the agent having to know in advance which calls might run long.
The protocol-level work that makes this possible is the spec team’s, not ours. The piece we did is shipping it early enough that an agent built against the upcoming protocol has somewhere to talk to before the spec is final — and shipping it strictly enough that nobody on today’s clients is dragged into a draft they did not ask for.