· 13 min read

Best-of-breed vs integrated: when six security tools become one

Best-of-breed security (Okta, Splunk, Netskope, Presidio) assumes a 20-person security-ops team. Here is what we built instead — and where best-of-breed remains the right call.

We tried buying the security stack first. The procurement timeline alone was longer than the integration would have been stable. So we built one.

Most security architecture writing says the same thing: best-of-breed. Pick the deepest tool in each category — Okta for identity, Splunk for SIEM, Netskope for DLP, Microsoft Presidio for PII — and run them together. The model is correct for Fortune 500 security organisations with a 20-person ops team. Best-of-breed is the term procurement uses for "we buy the category leader for each function, then integrate them" — the default assumption in most enterprise security planning.

What follows is not a "we built it better" pitch. The math behind best-of-breed assumed an operational bandwidth that has shifted under everyone in the last 18 months. This post is about what changed, what an integrated stack actually collapses, and the four cases where best-of-breed remains the right call.

Why we built it instead of buying it

The motivation was not to become a software shop. It was to give our team context across the tools we already used. Pull a Jira ticket into Claude, cross-reference against the linked merge request, draft an update for Confluence, ship the change. The friction was the integration layer: every AI client wanted its own auth, every tool wanted its own MCP server, and the resulting graph of credentials was unauditable in practice.

By the time we had evaluated the available enterprise components — an OIDC bridge, a SIEM for AI tool calls, a DLP layer for outbound payload volume, a PII sanitiser, a policy hook surface — the procurement and integration timeline alone was longer than the window in which the underlying tools (MCP itself, the major AI clients, the gateway market) were stable enough to integrate against.

So we built it. Not as a software project, but as the substrate that lets the rest of the company use AI agents against real internal tools without a six-vendor sprawl underneath. The build became a product because other teams kept asking for it.

What "best-of-breed" actually assumes

Best-of-breed is not a bias toward complexity. It is a defensible architecture choice given certain assumptions:

  • You have the team to integrate the tools. A 20-person security-ops function can own the Okta–Splunk–Netskope–Presidio integration matrix. A two-person platform team cannot.
  • The tools are stable enough to integrate against. Splunk's data model has not changed structurally in a decade. You can build against it once.
  • The cadence of change in your own product is slower than the cadence of integration maintenance. If your codebase ships fortnightly, six integration points can absorb that. If it ships daily, the integrations become the bottleneck.
  • The compliance regime explicitly requires named tools. Some procurement filters demand "SIEM" as a tool category that matches a Gartner Magic Quadrant entry — there is no substitute argument that survives that check.

All four assumptions are real and all four can be true. When they are, best-of-breed is the right call. We will not pretend otherwise.

What shifted under those assumptions

Three shifts, all from the last 18 months:

AI-augmented building broke the linear relationship between team size and release cadence. The gateway is, by May 2026, 119k lines of Python source with 127k lines of tests — more test code than source — and roughly 8.4 million lines of YAML configuration. It has shipped 967 releases in the last 75 days. The cadence is possible because the build loop is AI-augmented end-to-end: change → multi-perspective LLM review pass → automated test run → release. Each release passes the same automated review chain. The cadence number on its own says nothing about quality; it is the gate it passes through that does. We will come back to this.

The MCP-tool category is younger than most integration matrices. The Model Context Protocol shipped in late 2024. By the time most enterprise tools had MCP support, the protocol itself had moved past its first revision. A best-of-breed integration matrix assumes the components are stable enough to integrate against once. In the MCP category, that is not yet true. Integration that was correct in March is wrong in May.

The threat surface is also younger. Prompt injection, tool-call privilege escalation, slow-drip exfiltration via AI-mediated tool access — these are not categories Splunk or Netskope were architected against. The vendors are adding AI-specific modules quickly, but the modules are not the deep coverage their core products offer. Best-of-breed for AI security in 2026 still has gaps that an integrated tool can address without crossing six product boundaries.

What an integrated stack actually collapses

When we say "integrated" we do not mean a single tool that does everything badly. We mean six conceptually distinct layers that have been collapsed into one product surface, one runbook, one audit trail, one upgrade cadence.

Best-of-breed 6 vendors, 6 contracts, 6 runbooks Okta — Identity OPA — Policy Presidio — PII Splunk — Audit / SIEM Netskope — DLP (no established pure-play yet) Tool Aggregation / MCP Integrated 1 product, 1 runbook, 1 audit trail mcpgate Identity · Policy · PII Audit · DLP · Aggregation collapse Same six layers. Different operational shape.

The layers, with the pure-play vendor each would otherwise be:

LayerWhat it doesPure-play (if separate)
Identity & Access OIDC bridging across multiple providers, per-service guest allowlists, magic-link fallback Okta, Auth0, WorkOS
Authorisation / Policy Pre- and post-dispatch hooks, role-scoped service surface, hot-reloadable YAML rules Open Policy Agent, Cedar, Styra
PII Sanitisation Pseudonymise sensitive fields before the LLM sees them, rehydrate before the upstream call. Encrypted mapping with 24-hour TTL Microsoft Presidio (OSS), Tonic.ai, Skyflow
Audit & Compliance Logging Append-only log of every action with actor, service, byte-count, and outcome. No payload contents — only what is necessary to reconstruct activity Splunk, Elastic Security, Sumo Logic
DLP / Behavioural Detection Per-user and per-service throughput thresholds with Slack alerts on volume anomalies. The exfiltration question made answerable on the audit trail itself Netskope, Zscaler, Microsoft Purview
Tool Aggregation / Endpoint One MCP endpoint, per-user OAuth across 30+ services, cross-tool workflows in a single prompt No established pure-play yet — closest analogues are iPaaS tools (Zapier, Workato) for non-AI workflows, or stitching individual MCP servers per service

The detailed control mapping for each layer lives on /security, with the Statement of Applicability against ISO/IEC 27001:2022 Annex A. We are aligned with that standard, not certified — the public claim is calibrated against what is auditable today.

Transparency as a security property (and what BSL means here)

One property of the integrated approach that the best-of-breed model structurally cannot offer: everything is readable. The container you self-host carries the Python source the gateway actually runs. The configuration YAML is the configuration — not a compiled artefact derived from it. The audit log writes the bytes that flowed, not a summary an internal system decided to surface.

To be precise about the licence: mcpgate is source-available under BSL 1.1, not open source (the licence is not OSI-approved — it has a usage limitation that converts to a fully permissive licence after the standard BSL change date). This matters because the OSS-vs-source-available debate is real, and we will not pretend to a posture we do not have. Same licence pattern as CockroachDB, HashiCorp Terraform's recent shift, Sentry, and a growing number of infrastructure tools.

What this changes structurally for the security argument has nothing to do with OSI status. It has to do with whether you can read what runs. You can, in either case:

  • Six vendor black boxes vs one audit surface. A best-of-breed stack means trusting six vendors that their compiled binaries do what their datasheet says — that Okta does not log credentials it claims to encrypt, that Splunk does not exfiltrate data through a maintenance channel, that Netskope's classifier is what the documentation describes. None of these are likely to be malicious; all of them are unprovable from outside the vendor. An integrated source-readable stack collapses that into one thing your security team can review against the source.
  • Insider-threat exposure shifts. If a single rogue commit at a vendor were enough to compromise a layer, six vendors is six exposures with six commit histories no customer can see. A self-hosted binary you built from a published commit log is one exposure — and one you can audit before deploy and re-audit on every upgrade.
  • The backdoor question becomes verifiable. "Does this gateway phone home?" is a question we cannot answer for a closed vendor with credibility — only with their assertions. For mcpgate, it is a grep against the source plus a network trace on the container. The container egresses what the source says it egresses. Nothing else, by inspection.
  • Vendor survivability is not a pure trust dependency. If we stop maintaining mcpgate tomorrow, the source-available codebase still runs, still does what it did, and a team within the BSL usage limit can patch it themselves; a team above the limit can negotiate a commercial relationship or wait for the change-date conversion. Best-of-breed enterprise security depends entirely on the vendor remaining a going concern — and acquired vendors regularly deprecate the product line you bought.

None of this makes the integrated stack better than deeply specialised vendor tools at their core function. It does mean the trust model has a different shape. For a team whose security review is built around "what is in this container, exactly" rather than "what does this vendor promise", source-readability is the gating property — and it is one best-of-breed at this category does not have, regardless of whether the source is BSL, MIT, or anything else.

The review pipeline question

"967 releases in 75 days" reads two ways depending on what you assume sits behind it. If you read it as move-fast-break-things, it is alarming. If you read it as a release-per-change with the same automated review chain on each one, it is structurally different from a fortnightly release that batched ten changes through one human reviewer.

What sits behind each release:

  • Multi-perspective LLM review pass on every change set — agents instructed from distinct vantage points (security, regressions, API stability, data handling) on the same diff. The agreement layer surfaces what a single reviewer routinely misses.
  • An integration test suite run on every change — 127k lines of test code against 119k lines of source, written test-first from the start. Test-to-source ratio alone is a poor quality metric; we report it as evidence of the cadence the test loop sustains, not as a quality claim by itself.
  • Dependency security via Renovate, pip-audit, Trivy on container scans, npm audit on the marketing site — all running on a weekly schedule and on every change set, with auto-merge for low-risk upgrades.
  • An evidence map tying every released change to the relevant Annex A control on the SoA.

Two things this does not claim. It does not claim that automated review is strictly better than careful human review. It does claim that automated review at this cadence catches more in practice than the realistic alternative — one human reviewer trying to keep up with more than a dozen releases per day. That math does not work: either the reviewer becomes a rubber-stamp (no real coverage) or the cadence drops to fortnightly (no real velocity). The automated chain is what makes both qualities coexistable.

The honest open question is the one a careful reader will raise: if multi-perspective LLM review really catches what human review catches, why do major AI labs still employ thousands of human reviewers? We do not have visibility into how those labs distribute their internal review workload across model-policy, safety, and per-PR code review. The principled distinction we draw for our own work is layer-based. Per-PR code correctness, regression detection, dependency vulnerability, schema breaks — all automatable, and we automate them. Architectural decisions, threat-model changes, novel security surfaces — these still get human attention, documented as decision records in our ISMS. The split is deliberate; the automation covers the bandwidth-bottlenecked layer.

Where best-of-breed remains the right call

To be useful, this post has to be wrong sometimes. There are four cases where the integrated argument breaks down:

  • You already have a 20-person security-ops team that owns Splunk, Okta, Netskope. Adding a seventh vendor surface for "AI" is incremental. You will get more out of the depth those teams have built than out of replacing six tools with one.
  • Your compliance regime names tools as categories. Some sector regulators specify "FIPS 140-2 validated module" or "Common Criteria EAL4+" or a similar named substrate. An integrated tool that is "aligned with" the relevant standards will not pass the check that needs the certified box.
  • Your AI surface is itself a small fraction of total exposure. If AI tool access is 5% of how data leaves the company, the integrated tool covers a 5% slice. The other 95% still needs Netskope or equivalent for the email, browser, endpoint surface.
  • You are at a scale where any individual tool failure has material business impact. Best-of-breed gives you smaller blast radius per tool. An integrated tool has the entire gateway as the radius.

And the limits we will not pretend away even where integrated is the right call for your team:

  • We are not Splunk-deep in SIEM. The audit log is sufficient for the questions a procurement auditor asks about an AI gateway specifically. It is not a general-purpose enterprise SIEM with query language and ecosystem of integrations.
  • We are not Netskope-deep in DLP. Throughput-anomaly detection on the audit trail catches volume-based exfiltration. It does not match the depth of a dedicated DLP product across email, browser, cloud storage, and endpoint.
  • We do not yet federate guest identity to a partner organisation's IdP. Guests with their own corporate IdP can use magic link today; cross-tenant OIDC federation is in design but not shipped.
  • Sub-service authorisation is hook-level, not declarative. "Allow Jira project PBE but not project ENG" is expressible in a policy hook today. A UI for that is on the roadmap, not in the product.

What "fits" looks like in practice

Where the integrated stack is the right shape:

  • Mid-market companies with no dedicated security-ops function. The platform engineer who already runs everything else also runs this. One product surface to learn, one upgrade cadence to track, one audit trail to query when the auditor asks.
  • Self-hosted requirement. Best-of-breed in the security category is overwhelmingly SaaS today. Self-hosted Okta is enterprise-tier; self-hosted Netskope does not exist. An integrated stack you can run on your own infrastructure shifts what "self-hosted security" actually means.
  • AI-augmented build cadence. If your own cadence is daily or sub-daily, the integration maintenance of six tools becomes the long-pole. Reducing it to one is not a luxury at that point — it is the only stack that keeps pace.
  • BSL or source-available licence appetite. Some teams will only run software they can read and modify. The integrated stack is source-available end-to-end. Best-of-breed enterprise security is overwhelmingly closed.

Frequently asked questions

Can an integrated stack replace Splunk for SIEM?

For AI-gateway activity, yes. For general enterprise SIEM covering endpoints, network, email, identity, cloud — no. The audit log we ship is scoped to gateway activity and is fully sufficient for that scope, including for the SOC 2 CC7.2 control on AI tool surfaces. It is not architected as a general-purpose SIEM and would be a poor choice for that job.

What is the operational cost of a six-vendor best-of-breed stack?

The licence cost varies by deal size and is usually the smallest line item. The persistent cost is the integration matrix and the institutional knowledge required to operate it. Industry guidance from IBM's annual Cost of a Data Breach reports consistently shows that organisations with consolidated security tooling identify and contain breaches faster than organisations with fragmented tooling — a consequence of the runbook and audit-trail unification we describe above.

How many security tools does a mid-market company actually need?

The honest answer depends on what they already do. A B2B SaaS company without a regulated industry usually runs identity (one tool), endpoint protection (one tool), some form of audit logging, and an email security layer. Adding AI-gateway-specific tools to that should not require six more vendors. The integrated stack is one of those four, not a seventh on top.

Is mcpgate ISO 27001 certified?

No. The platform is implemented and documented against ISO/IEC 27001:2022 Annex A controls — what we publicly describe as "aligned" — but no accredited audit has issued a certificate. The Statement of Applicability, risk treatment plan, and evidence map live in our internal ISMS, and the public-facing summary is on /security. We will pursue certification when a customer contract requires it. We have explicitly chosen not to publish a roadmap with dates, because dates that slip become advertising claims under §5 UWG, and that risk is asymmetric.

What review process actually runs on every release?

Multi-perspective LLM review on each change set (security, regression, API stability, data handling vantage points), the full automated integration test suite, Renovate plus pip-audit plus Trivy on dependencies, and the evidence map tying changes to Annex A controls. Human review sits at the architectural level — every design that touches security has a decision record in the ISMS — not on the per-change diff. The choice was deliberate; the diff-level decision proved automatable and the architectural decisions did not.

How does this compare to other MCP gateways like Obot, Docker MCP, or MintMCP?

Different question, different answer. The post above argues integrated-stack vs best-of-breed-vendor-stack — a stack-level comparison. The comparison against other MCP gateways is product-level: which gateway product fits a specific operational profile. The comparison pages handle that — each one breaks out where the named competitor is the better choice (typically: more mature, larger community, fully OSS) and where mcpgate is (PII pseudonymisation depth, two-layer policy hooks, source-available with compliance-aligned ISMS documentation).

Further reading