Context Map

CI/CD for what your AI knows about your company — a curated org-context layer your AI clients consult automatically, with the same build/validate/deploy/rollback discipline you already use for code.

Sidebar → General → Context Map — at /admin/context-map on your gateway. The map is a small Markdown corpus describing your company's wiring: which systems and repos exist, who owns them, where the detail lives, and how things connect. The gateway syncs it, serves it to connected AI clients, notices what it couldn't answer, and lets agents correct it — with a leak-scan + validation gate on every write.

Two source modes

Pick one per gateway instance, switch any time:

  • Repo mode — point the gateway at a git repository (GitLab, GitHub or any HTTPS git host). The repo is the source of truth; every change is a commit with full history, diffable and revertable. A configurable poll keeps the served copy fresh; an optional webhook lets a push trigger an immediate reindex. Recommended when more than one person curates the map.
  • Inline mode — an editable textbox in the admin page. Stored in Redis, zero git setup. Useful for small instances, single-operator deployments, or for hand-editing during a test. Inline content can be exported and migrated to a git repo later without losing history.

Either mode can also accept content programmatically over the content-push REST endpoint — useful when another system (e.g. a service catalogue, a wiki sync) wants to POST entries directly.

Map structure

A shallow tree, intentionally:

  • L0 digest — the always-in-context index, auto-built by the gateway from the page set. One line per entity at small scale; collapses to one heading per domain (or per GitLab group, for federated repos) as the catalogue grows. Self-bounded to the configurable size budget (default ~2 KB). You don't curate this file — it's a derivative.
  • pages/*.mdL1 domain pages. Navigational: where things live, how they connect, who owns them, where the detail is. Small — point to sources, don't inline them.
  • pages/repos/*.mdfederated repo pages, one per source repo. Read-only on the map side; the source of truth stays in the repo itself (see Federation below).
  • L2 — the real sources (repos, Confluence, code). Not in the map; reached through the gateway's existing connectors when more detail is needed.

Each page declares what it is about in frontmatter:

---
title: The Shop
relevant_to: [pipedrive, opencms6, sales]
---

The relevant_to tags are the join key — the gateway matches a tool call or query against these to surface the right pages. Tagging is many-to-many: a page can be relevant to several services or domains.

Federation — context lives with its owner

The map is not one central wiki that one person curates. Every repo can ship its own context file alongside its code; the gateway composes those distributed files read-only into a navigable corpus and re-syncs them on every push. Edits flow through the source repo's normal PR review and git history — not through the map.

Two authority tiers per source

  • Canonical — a repo that ships a context_map.md at its root. Purpose-built navigational page, full authority, used verbatim.
  • Derived starter — a repo without a context_map.md but with a CLAUDE.md, AGENTS.md or llms.txt at the root. The gateway condenses it into a starter page at lower authority, flagged as derived. The map is therefore useful from day one; teams upgrade to a real context_map.md when they want to.

A plain README.md is deliberately not a membership signal — READMEs are project docs, the map is the navigational layer. A repo opts in by shipping one of the recognised files.

Federated pages appear at pages/repos/<repo>.md and are read-only on the map side. Editing happens in the source repo (full PR review, full git history).

Two ways to declare membership (merged internally)

  • Group auto-discovery (recommended). Configure one or more GitLab group paths; the gateway scans those groups and federates every repo that ships one of the recognised files. Membership = "the file is there" — add the file to a repo to opt in, remove it to opt out. No hand-maintained repo list. Each repo root is listed in a single call, so discovery stays cheap as the org grows.
  • Explicit list (federation.yaml or admin UI). A federated_sources list for the edge cases — e.g. pulling a single repo from outside the configured groups. The admin UI writes through to the same file, so either editing path is equivalent.

With discovery groups configured, the explicit list is usually empty — you don't need it for repos inside the scanned groups.

Index grouping at scale

Once federation pulls in many repos, the always-loaded L0 index collapses to one heading per group (e.g. per GitLab group) instead of one per repo, so the digest stays inside its byte budget. Inside a heading, the assistant follows the link to the relevant repo page on demand.

The point of federation: the map starts from existing context engineering (the CLAUDE.md a team has been keeping for months) and grows into a curated layer at the pace the team chooses — there's no "blank map" day where everyone is told to write something new.

Tools the AI client sees

ToolWhat it does
company_context_readSearch the map (BM25 over the index) or fetch a single page by id. Returns the matching page snippets plus the L0 digest. The tool is hidden from external / viewer roles by default.
company_context_writeCreate or replace a page, delete a page. In repo mode the gateway commits and pushes through its own identity; in inline mode the inline store is updated. Every write passes the same leak-scan + size/format validation + attribution gate. The L0 index is not written through this tool — it auto-rebuilds from the page set.

A session-start hint tells the model to base answers about systems, repositories, ownership, connections or project status on a company_context_read call first — explicitly not from memory, earlier notes, or a local working copy, since those go stale while the map is server-synced on every push. The hint ships as a baseline platform hint to every install that has the Context Map enabled, and includes the live entity-index so the assistant recognises a question as a map question from the first turn.

How freshness rides every tool response

The gateway holds a tiny in-memory version key for the map. Each session carries a cursor of the last version it saw. Before any tool call returns, the gateway compares the two:

  • Unchanged (the common case) — pass through, no extra payload, no extra round-trip.
  • Changed — the gateway computes the tag-scoped relevant delta and attaches it to the tool's response, once per session per change. The cursor advances. No banner-blindness, no spam.

The beacon rides every tool response, including passthrough services (Jira, Slack, Notion, GitLab, …). A working session notices a mid-session map change on its very next tool call — no client interrupt, no re-send. The check itself is a small version compare; the snippet fetch overlaps the tool call, so there is no meaningful added latency.

The gap feed — what the map couldn't answer

When company_context_read returns nothing for a query, the gateway logs the question to a bounded, de-duplicated, PII-scrubbed gap feed. Repeated misses for the same shape aggregate into one entry with a frequency, so the most-asked unanswered questions rise to the top. The feed is:

  • Visible to the operator — pullable from the admin page; the raw material for proposing new pages.
  • Read-only by default — nothing is written to the map automatically. An external enrichment step can read the feed and propose pages, but the map itself only changes via company_context_write or a commit.
  • Off-switchable per instance.

This is the half of the CI/CD analogy that does not exist in a wiki: observability — the gateway notices what its consumers asked for and didn't get.

The write gate — one path, every feed

Three write feeds, one shared gate:

  • Inline edit — admin textbox in the admin page.
  • Content-push RESTPOST /api/context-map/push with a per-tenant X-Context-Map-Token header. Rate-limited and idempotent; reindex only.
  • company_context_write — AI agent writes through the MCP tool.

Every write, regardless of source, runs the same three checks:

  1. Leak-scan — rejects content that looks like secrets, API keys, tokens, or env values.
  2. Size and format validation — per layer (L0 / L1) and per page.
  3. Attribution — the writer's identity is captured at the gateway and attached to the change.

In repo mode, an accepted company_context_write is committed and pushed to the configured repo through the gateway's own identity (the token is kept tokenless in .git/config and supplied only transiently per fetch / push). The served copy reindexes immediately, without waiting for the next poll.

If a write would land outside the repo checkout via a symlink, or use a host outside an exact-match allowlist for git hosting, the gateway refuses and records the refusal in the audit log.

Refresh — pull and push

  • Pull — the gateway polls the configured remote at an interval you set (default friendly to small instances). Cross-container-locked, self-gating; one container does the work and others reuse the index.
  • Push (optional) — the admin page exposes a webhook URL plus a per-tenant secret. Configure your git host's webhook to call it on push to the map's branch; the gateway reindexes immediately. The secret travels in the X-Gitlab-Token header (not in the URL), is rate-limited and idempotent.

Pull and push coexist. If both are configured, the next poll re-syncs from the remote, so a webhook missed by a transient outage self-heals at the next poll cycle.

Who can see the map

Context Map is operator-only by default. The company_context_read and company_context_write tools, and the freshness beacon, are hidden from the external and viewer roles. A guest or external collaborator never sees that a map exists; their tool responses carry no map context.

For roles that do have access, the freshness beacon attaches only the slice relevant to the caller's role. The admin viewer is XSS-hardened: links are scheme-allowlisted (http / https / mailto) and attribute values are quote-escaped.

Safety — allowlist, not scrub

The map should never contain secrets, API keys, customer PII, or anything you would not want pasted into a Slack channel. The defence is structural:

  • Allowlist sources. Pages enter from explicitly cleared inputs — a curated git repo, the admin textbox, the content-push REST endpoint with a per-tenant secret, or company_context_write via the MCP gate. Never from an "everything-dump" of all tool traffic.
  • Leak-scan on every write. The same gate runs on git pulls, inline edits, content-push, and company_context_write. A leak is the one thing git can't undo — revert doesn't un-expose — so the scan is hard, pre-merge.
  • Plain text in your git. Markdown the operator can read end-to-end. If a leak slipped past, it's visible — not buried in an opaque store. The repo itself is the audit surface.
  • Audited. Every write is recorded in the audit log with the writer's identity and the refusal reason if rejected. Every refused redirect, every symlink-escape attempt, every host-allowlist failure surfaces in the log.

What it is not

  • Not a wiki replacement. The map points at the sources of truth (the repo, the Confluence page, the runbook). It does not duplicate them. A good map is shallow: 2–3 levels, then a link.
  • Not a knowledge graph. Plain Markdown, not an exposed graph database. AI clients consume Markdown well, and a graph DB is a worse audit surface than a diffable text repo.
  • Not a place for personal memory. Personal memory lives in the AI client (Claude, ChatGPT). The map is the shared, organisation-wide layer — the part no client can solve for everyone.
  • Curated pages are human-written. Federated repo pages without their own context_map.md are auto-derived from existing context files (CLAUDE.md / AGENTS.md / llms.txt) as lower-authority "derived starters"; everything else lands only through the write gate. The gap feed proposes candidates; nothing turns into a curated page without a human or a gated company_context_write.

Related

  • Compliance — the umbrella surface; Context Map writes are recorded in the audit log alongside everything else.
  • Audit Log — forensic substrate for every company_context_write, refused write, and symlink-escape attempt.
  • Access Model — the roles that decide who sees the map.
  • Notifications — the dispatcher; future candidates include map-write alerts and gap-feed-threshold alerts.
  • Hooks — the freshness beacon is a built-in gateway-wide pre-hook; user hooks can read the map in their own logic.