Skip to content

Configuration

OpenCodeHub honours a small, stable set of environment variables. Each variable is read from process.env at the entry point that owns it (CLI, MCP server, ingestion phase, embedder backend); none of them mutate global state.

The graph tier is always LadybugDB (graph.lbug) and the temporal tier is always DuckDB (temporal.duckdb). There is no backend selector — the CODEHUB_STORE env var was removed in ADR 0016 along with the probe and the DuckDB-as-graph fallback. If the LadybugDB binding cannot load, open() throws GraphDbBindingError.

VariablePurpose
CODEHUB_HOMEOverride ~/.codehub/ (where the registry, embedder weights, and global state live).

ADR 0013 (docs/adr/0013-m7-default-flip-and-abstraction.md) records the IGraphStore / ITemporalStore interface segregation; ADR 0016 (docs/adr/0016-duckdb-graph-rip.md) records the rip-out of the DuckDB graph backend, the env var, and the resolver.

web-tree-sitter (WASM) is the only parse runtime on Node 20, 22, and 24. There is no env var or CLI flag to switch parsers — the native tree-sitter N-API addon was removed in 0.4.0. The CLI emits a one-shot stderr advisory if a stale legacy env var is set, then ignores it; consult the CHANGELOG and ADR 0015 for the variable name and migration notes. ADR 0013 records the prior WASM-default + native-opt-in posture and is superseded by ADR 0015 (docs/adr/0015-wasm-only-parser-at-the-npm-distributed-boundary.md).

The cascade is SageMaker → HTTP → ONNX. The first variable group that resolves wins; the others are ignored.

VariablePurpose
CODEHUB_EMBEDDING_SAGEMAKER_ENDPOINTSigV4-authenticated SageMaker endpoint name. When set, the SageMaker backend wins.
CODEHUB_EMBEDDING_SAGEMAKER_REGIONOverride the AWS region for the SageMaker call.
CODEHUB_EMBEDDING_URLBase URL for an OpenAI-compatible HTTP endpoint (Infinity, vLLM, TEI, Ollama, LM Studio, OpenAI). /embeddings is appended.
CODEHUB_EMBEDDING_MODELModel id passed through to the HTTP endpoint verbatim.
CODEHUB_EMBEDDING_DIMSDimensionality of the embedding model. Default 320.
CODEHUB_EMBEDDING_API_KEYBearer token sent as Authorization: Bearer ....

When none of the above are set, the local ONNX backend (F2LLM-v2-80M, 320-dim, deterministic, offline-safe) is used.

VariablePurpose
CODEHUB_DISABLE_SCIPSet to 1 to make the scip-index ingestion phase a no-op. Heuristic edges still flow.
CODEHUB_ALLOW_BUILD_SCRIPTSSet to 1 to allow SCIP indexers that require a build (Rust, Java) to run. Off by default for clean-room safety.
CODEHUB_BEDROCK_SUMMARIESSet to 1 to opt the LLM summarize phase in. Equivalent to --summaries. Off by default — codehub analyze runs fast, local, deterministic phases only.
CODEHUB_BEDROCK_DISABLEDSet to 1 to force-disable the LLM summarize phase. Equivalent to --no-summaries. Wins over CODEHUB_BEDROCK_SUMMARIES=1 and --summaries.
NO_COLORStandard convention; disables colored console output.

codehub analyze writes everything under <repo-root>/.codehub/. The layout is fixed: a LadybugDB graph file alongside a DuckDB temporal file.

PathPurpose
graph.lbugLadybugDB graph store — nodes, edges, embeddings, BM25 + HNSW indexes.
temporal.duckdbSibling DuckDB file — temporal store (cochanges, symbol-summary cache).
meta.jsonIndex metadata: graph hash, node counts, CLI version, embedder model id.
scan.sarifSARIF output from codehub scan.
sbom.cyclonedx.json / sbom.spdx.jsonSBOMs when codehub analyze --sbom has run.

Safe to delete and rebuild at any time via codehub clean + codehub analyze.

The registry maps each registered repo to its index path. It is consulted by:

  • Every per-repo MCP tool that accepts an optional repo argument.
  • codehub list, codehub status, codehub clean --all.
  • codehub group create when resolving repo names.

CODEHUB_HOME relocates the parent directory.

Each editor writer has a fixed target path and merges a codehub entry non-destructively:

EditorPathFormat
claude-code<project>/.mcp.jsonJSON
cursor~/.cursor/mcp.jsonJSON
codex~/.codex/config.tomlTOML
windsurf~/.codeium/windsurf/mcp_config.jsonJSON
opencode<project>/opencode.jsonJSON

--undo removes only the codehub entry each writer added; other entries are preserved.