Skip to content

Cross-repo federation

OpenCodeHub federates across repos at the graph layer, not at the client layer. The repo is a typed graph node, group membership is a graph relation, and every per-repo MCP tool understands the same canonical repo_uri shape.

ADR 0012 promoted the repo from a runtime-only registry handle (the absolute working-tree path stored in ~/.codehub/registry.json) to a typed Repo node in the graph. The promotion fixed three concrete gaps the earlier shape could not close:

  1. Cross-repo edges had no typed source/target. group_cross_repo_links emits records like {source_repo_uri, target_repo_uri, source_doc_path, target_doc_path, relation}. Without a graph-side Repo entity those records were free-floating tuples; with it, they are first-class edges that join into ownership and centrality queries.
  2. AMBIGUOUS_REPO choices[] had no graph backing. The structured envelope payload now sources {repo_uri, default_branch, group} straight from the graph, not the runtime registry.
  3. group_* tools needed a typed primitive for fan-out. The Repo node lets group tools compose with the rest of the graph instead of going through a separate registry-only code path.

repo_uri is a Sourcegraph-style URI. Two shapes:

ShapeExampleWhen
Hostedgithub.com/org/repoRepos with a known remote.
Locallocal:<sha256-of-path>Unpublished or local-only repos.

Every per-repo tool accepts both repo (registry name — the human-readable handle) and repo_uri (the typed graph attribute). When both are supplied, repo_uri wins. Every group tool emits repo_uri in the same canonical form, so a caller can chain a group query into a per-repo retry without translation.

When two or more repos are registered and a per-repo tool is called without either repo or repo_uri, the server returns AMBIGUOUS_REPO in the structured-error envelope:

{
"structuredContent": {
"error": {
"error_code": "AMBIGUOUS_REPO",
"jsonrpc_code": -32602,
"choices": [
{ "repo_uri": "github.com/org/api-svc", "default_branch": "main", "group": "platform" },
{ "repo_uri": "github.com/org/billing-svc", "default_branch": "main", "group": "platform" }
],
"total_matches": 2,
"hint": "Retry with repo_uri=<one of above>"
}
}
}

The choices[] array is capped at 10. When total_matches > choices.length, the caller knows the list was truncated.

The retry shape:

{ "tool": "context", "args": { "repo_uri": "github.com/org/api-svc", "symbol": "..." } }

This is what makes the federation surface composable from a deterministic agent loop: the loop never has to guess.

Six MCP tools form the group cluster. All of them key off a named group; codehub group create registers it.

List the groups configured on this machine. Cheap. Always safe to call before fanning out.

BM25 + RRF over every member repo, fused into a single ranked list. Each hit carries its source repo_uri so a follow-up context / impact call has the disambiguator handed to it.

Per-repo staleness across the group. Returns {repo_uri, indexed_at, graph_hash, staleness_lag_commits} per member. The agent uses this to decide whether the cross-repo answer can be trusted before spending tokens on it.

The HTTP contract matrix. Walks Route (producers) and Fetch (consumers) edges across repo boundaries. Pairs every producer route with its known consumers, including the fetches:unresolved:<id> pseudo-targets that the heuristic resolver emits when a consumer’s URL pattern does not match any local route.

The audit-trail tool. Returns every typed cross-repo edge in the group, with both endpoints fully qualified by repo_uri and source_doc_path / target_doc_path filled in for documentation references.

Rebuild the group’s contract registry and cross-link table after a member has been re-indexed. Idempotent.

Groups are not a separate ingestion pipeline. Each member repo is indexed independently with its own codehub analyze. The group registry is a thin layer on top — it tells the federation tools how to fan out.

That has two consequences worth calling out:

  • Re-indexing one member does not invalidate the rest. codehub analyze on repoA does not touch repoB’s .codehub/. The next group_query simply sees a fresher hash for repoA.
  • Group queries respect per-repo determinism. Fan-out is reciprocal-rank-fusion of independently deterministic per-repo results, so the group answer is deterministic by construction (at fixed group membership).