What is OpenCodeHub?
AI coding agents have a structural blind spot. They can read a file, but they can’t see the graph the file lives in. That blind spot produces three failure modes every agent-driven workflow eventually hits:
- Missed dependencies. The agent renames a function and leaves
callers untouched, because
grepfound a fraction of the call sites. - Broken call chains. The agent changes a return shape, a handler two hops downstream crashes at runtime, and neither the agent nor its tests flag it. The relationship was never in context.
- Blind edits. The agent rewrites a critical-path function without knowing it sits on the hot path of multiple production flows, because nothing computed that ahead of time.
Grep is textual. Language servers are per-file. Embeddings are lossy. None of them answer the questions an agent needs answered before it writes a diff: what breaks if I change this, what depends on this, and where does this data flow.
The graph-first approach
Section titled “The graph-first approach”OpenCodeHub parses your repository with tree-sitter (15 GA languages,
plus SCIP indexers for TypeScript, Python, Go, Rust, and Java),
resolves imports and inheritance, and materialises a typed symbol
graph. That graph is stored in LadybugDB, a graph-native database,
with DuckDB carrying the temporal sibling (cochanges and the
symbol-summary cache). Both tiers are always present — there is no
backend toggle, and a failure to load the @ladybugdb/core binding
aborts the operation rather than falling back. BM25 lexical search and
filter-aware HNSW vector search sit on the same store. A local MCP
server exposes the graph to any agent that speaks Model Context
Protocol.
Clustering, execution-flow tracing, and blast-radius analysis all happen once at index time. Agents get complete relational context in one tool call, not ten round-trips.
What you get in v1
Section titled “What you get in v1”- Graph-native storage. LadybugDB is the graph tier and a dedicated
DuckDB sibling serves the temporal store. Both files (
graph.lbug+temporal.duckdb) are written on every index — no backend knob, no fallback layout (ADR 0016). - Cross-repo federation. Group several indexed repos with
codehub groupand query them through thegroup_*MCP tools. The repo is a first-class graph node andrepo_uricarries through every cross-repo response, including theAMBIGUOUS_REPOenvelope. - Deterministic code-pack.
pack_codebase(MCP) andcodehub code-packproduce a reproducible 9-item BOM signed by the release workflow. - WASM-only parsing.
web-tree-sitteris the only parse runtime on Node 20, 22, and 24, with all 15 grammar.wasmblobs vendored in the@opencodehub/ingestiontarball.npm install -g @opencodehub/cli@latestdoes zero native builds and zero GitHub fetches (ADR 0015).
When to reach for OpenCodeHub
Section titled “When to reach for OpenCodeHub”- Non-trivial refactors. Rename a function, change a return shape, or move a module and let the agent see every caller before it edits.
- Cross-file changes. Any diff that touches more than one file and crosses a module boundary.
- Blast-radius questions. “What processes depend on
validateUser? What is the risk tier of this change?” - Onboarding to a new repo. Ask the graph for the top clusters, HTTP routes, or authentication flow before the first edit.
Next, install the CLI and run your first query.