> ## Documentation Index
> Fetch the complete documentation index at: https://theagenticguy.github.io/opencodehub/llms.txt
> Use this file to discover all available pages before exploring further.
> Scoped bundles: [user-guide](https://theagenticguy.github.io/opencodehub/_llms-txt/user-guide.txt) · [mcp](https://theagenticguy.github.io/opencodehub/_llms-txt/mcp.txt) · [contributing](https://theagenticguy.github.io/opencodehub/_llms-txt/contributing.txt)

# Adding a language provider

OpenCodeHub ships 15 GA tree-sitter language providers today:
TypeScript, TSX, JavaScript, Python, Go, Rust, Java, C#, C, C++,
Ruby, Kotlin, Swift, PHP, and Dart. Most of them are further
upgraded with SCIP indexers for compiler-grade cross-module edges
(scip-typescript, scip-python, scip-go, rust-analyzer, scip-java,
scip-dotnet, scip-clang, scip-kotlin, scip-ruby).

Adding a new language is four steps. The registry is compile-time
exhaustive, so the TypeScript build fails if you forget step three.

## Step 1 — Pin the tree-sitter grammar

Add the grammar as a pinned dependency in `packages/ingestion/package.json`.
Use a concrete semver; do not use `^` or `latest`. Grammars change AST
shapes between versions and a float range will silently break
extraction.

```json title="packages/ingestion/package.json"
{
  "dependencies": {
    "tree-sitter-<lang>": "1.2.3"
  }
}
```

Then `pnpm install` and verify the grammar loads by running the parse
bootstrap tests locally.

## Step 2 — Implement the provider

Create `packages/ingestion/src/providers/<lang>.ts` exporting a
`LanguageProvider` object. The interface lives at
`packages/ingestion/src/providers/types.ts`. Required fields and
methods:

| Member                | Purpose                                                                 |
|-----------------------|-------------------------------------------------------------------------|
| `id`                  | The `LanguageId` string (must already exist in `@opencodehub/core-types`) |
| `extensions`          | File extensions this provider claims                                    |
| `importSemantics`     | `named` / `namespace` / `package-wildcard` (see below)                 |
| `mroStrategy`         | `c3` / `first-wins` / `single-inheritance` / `none` (see below)         |
| `typeConfig`          | `{ structural, nominal, generics }` booleans                            |
| `heritageEdge`        | `"EXTENDS"` / `"IMPLEMENTS"` / `null`                                   |
| `extractDefinitions`  | Emit one record per defined symbol                                      |
| `extractCalls`        | Emit one record per call site                                           |
| `extractImports`      | Parse `import` / `use` / `require` statements                           |
| `extractHeritage`     | Emit inheritance / trait-impl / interface-implements edges              |
| `isExported`          | Predicate: is this definition publicly exported?                        |

Optional hooks improve coverage:

| Member                    | Purpose                                                           |
|---------------------------|-------------------------------------------------------------------|
| `detectOutboundHttp`      | Detect `fetch("/api")`, `requests.get(url)`, `axios.post(url, ...)` |
| `extractPropertyAccesses` | Emit `ACCESSES` edges for `receiver.property` reads/writes        |
| `preprocessImportPath`    | Strip `.js` suffix for TS, resolve `__init__.py`, etc.            |
| `inferImplicitReceiver`   | Name for `this` / `self` inside a method body                     |
| `complexityDefinitionKinds` / `halsteadOperatorKinds` | Enable cyclomatic + Halstead metrics |

### Picking `importSemantics`

- **`named`** — the statement names specific symbols:
  `` (TypeScript, JavaScript), `const result = await parseFixture(pool, "<lang>", "sample.<ext>", src);
const defs = <lang>Provider.extractDefinitions({
  filePath: "sample.<ext>",
  captures: result.captures,
  sourceText: src,
});
// assert on defs...
```

Cover at minimum: a top-level function, a class with one method, an
import statement, a call to an imported symbol, and an exported vs.
non-exported symbol. If your language has generics / traits /
interfaces, add a fixture per heritage shape.

The `parseFixture` helper returns a pool-borrowed `ParseCapture` array
that matches exactly what the ingestion pipeline passes in at runtime,
so the assertions you write here mirror production behaviour.

## CI expectations

Once the four steps are in place:

- `mise run lint` — Biome check passes.
- `mise run typecheck` — registry exhaustiveness passes.
- `mise run test` — your fixture tests pass under `pnpm -r test`.
- `mise run banned-strings` — you did not accidentally copy names from
  another project.

If your language has an available SCIP indexer, a follow-up PR can add
it to `packages/scip-ingest/src/runners/` and `.github/workflows/gym.yml`
to upgrade heuristic edges to compiler-grade. That is not required for
shipping the heuristic provider.

## Related files

- `packages/ingestion/src/providers/types.ts` — the `LanguageProvider`
  interface.
- `packages/ingestion/src/providers/registry.ts` — the exhaustive map.
- `packages/ingestion/src/providers/test-helpers.ts` — `parseFixture`.
- `@opencodehub/core-types` — the `LanguageId` union.

## See also

* [Language matrix](/opencodehub/reference/languages/)
* [Architecture overview](/opencodehub/architecture/overview/)
* [Testing](/opencodehub/contributing/testing/)
