Skip to content

codehub-code-pack

Standalone skill. Surfaces the pack_codebase MCP tool to produce a deterministic, 9-item Bill of Materials (BOM) at <repo>/.codehub/packs/<packHash>/ that is byte-identical given the same (commit, tokenizer, budget, chonkie_version, duckdb_version, grammar_commits). The pack is the durable artifact agents hand to a long-context LLM, archive for later replay, or diff between commits to prove invariants did not change.

name: codehub-code-pack
argument-hint: "[<repo-or-group>] [--budget <N>] [--tokenizer <id>]"
allowed-tools: pack_codebase, list_repos, project_profile, list_findings
model: sonnet
  1. mcp__opencodehub__list_repos — confirm the repo is indexed. If two or more repos are indexed and no repo argument was supplied, retry with a structuredContent.error.choices[].repo_uri value from the AMBIGUOUS_REPO envelope.
  2. mcp__opencodehub__project_profile — confirm graph freshness; surface any _meta.codehub/staleness envelope so the user knows the pack reflects the last codehub analyze, not HEAD.
  3. mcp__opencodehub__list_findings — optional, only when findings are requested in the pack.
  4. mcp__opencodehub__pack_codebase with the default engine: "pack". The tool resolves the output to <repoRoot>/.codehub/packs/<packHash>/ and writes the 9 items plus manifest.json.
  5. Report back the packHash, the determinismClass, and the absolute output directory; name the cause when the class is best_effort or degraded.

Run the single-repo flow per member of the named group, then emit a table of (repoUri, packHash, determinismClass, outDir). packHash is per-repo, not per-group — a group pack is the union of the member BOMs.

#FileDeterminism contract
1manifest.jsonRFC 8785 canonical JSON; pack-hash field omitted from the preimage; CRLF normalized to LF before hashing
2skeleton.jsonlPageRank score DESC, then id ASC
3file-tree.jsonlpath ASC
4deps.jsonl(ecosystem, name, version, id) lexicographic ASC
5ast-chunks.jsonlLF-normalized; degrades to line-split with determinismClass: degraded
6xrefs.jsonlcommunity rows first, then call rows
7embeddings.parquetOPTIONAL — absent entirely when no embeddings exist
8findings.jsonlseverity rank then ruleId ASC
9licenses.md + readme.mdalpha-sorted dependency list + manifest-derived header
ClassMeaning
strictSame inputs → same packHash; the full reproducibility contract holds.
best_effortThe tokenizer is an Anthropic API tokenizer that may rotate behind the model name; other inputs stay pinned.
degradedA primitive fallback was used (e.g. line-split chunker). Re-runs match locally but not across machines.