# ADR-015: OpenAI Responses → Bedrock mantle proxy

**Status**: Accepted (implementation updated 2026-06-30 — see note below)
**Date**: 2026-06-11
**Deciders**: AI Engineering NAMER

:::note[Update (2026-06-30): now served by agentgateway, not Portkey]
The mantle lane is **now ported into the rendered agentgateway config** (it was previously "not yet ported"). The decision below — *serve the OpenAI Responses lane through the gateway, mantle is a native OpenAI-compatible upstream, host is pinned* — still holds. **What changed is the mechanism**, since the data plane moved from Portkey to agentgateway ([ADR-017](/ai-gateway/adrs/017-agentgateway-data-plane-spike/)):

- The lane is an agentgateway **custom provider** with `formats: [{type: responses}]` and a pinned **`hostOverride`** to the mantle endpoint in **host:port** form (e.g. `bedrock-mantle.us-east-1.api.aws:443`) — superseding the Portkey `openai` provider + `custom_host` described in the body below.
- It is gated by a new Terraform variable, **`mantle_host`** (empty by default = lane disabled: no `/openai/v1` route is rendered and no mantle secret is provisioned). Setting `mantle_host` to the pinned endpoint enables the lane.
- The route matches **`pathPrefix: /openai/v1`** and is rendered **before** the `/v1` catch-all (agentgateway matches routes in order, most-specific prefix first).
- The Bedrock API key is held server-side, injected as the **`MANTLE_BEDROCK_API_KEY`** secret (Secrets Manager path `ai-gateway/mantle-bedrock-api-key`). The host is **pinned** — callers cannot override it — and egress to OpenAI SaaS is denied at the VPC as defense-in-depth.
- The mantle route runs the **`budget_enforcement` webhook** but **not** the inline Bedrock guardrail (the guardrail is Converse/Chat-shaped; the mantle lane is Responses-shaped).

Verified against the live agentgateway v1.3.1 binary. The Portkey `custom_host` mechanism in the body below is retained as historical context.
:::

## Context

The OpenAI Codex client is **Responses-API-only** (`wire_api = "chat"` was removed upstream and is rejected at config parse). The flagship GPT-5.5 / GPT-5.4 models on Amazon Bedrock are **also Responses-only**, served at the OpenAI-compatible mantle endpoint:

```
https://bedrock-mantle.<region>.api.aws/openai/v1/responses
```

Portkey OSS's `bedrock` provider maps OpenAI Chat Completions -> Bedrock **Converse** and has **no** `createModelResponse` (Responses) implementation. An early design read this as "the gateway cannot serve Codex/GPT-5.5 to Bedrock," and proposed either (a) forking Portkey to build a Responses->Converse translator or (b) routing Codex *around* the gateway directly to mantle. Both were wrong.

## Decision

Serve the OpenAI Responses lane **through the gateway, with stock Portkey and no fork**, by treating mantle as what it is -- a native OpenAI-compatible upstream. Use Portkey's `openai` provider (which **does** implement `createModelResponse`) with a `custom_host` pointed at the mantle base:

```json
{ "provider": "openai",
  "custom_host": "https://bedrock-mantle.<region>.api.aws/openai/v1",
  "api_key": "<BEDROCK_API_KEY>" }
```

The gateway rewrites only the upstream host, preserves the `/responses` path verbatim, and re-issues `Authorization: Bearer <Bedrock API key>` (which mantle's OpenAI-compatible path accepts). `before_request_hooks`, logging, and cost attribution all run -- so the flagship lane stays governed and inside the customer AWS boundary.

**Path is per model family** (verified live): GPT-5.5/5.4 use the mantle `/openai/v1` base; gpt-oss-120b/-20b use the `/v1` base (Chat Completions, served by the existing `bedrock` provider -> Converse). The gateway sets `custom_host` per family.

## Verification

Confirmed live (Portkey OSS main @ `669825c`, local spike, 2026-06-11):

- A Codex-shaped Responses request through the gateway returns a valid completion from mantle for both `openai.gpt-oss-20b` (`/v1`) and `openai.gpt-5.5` (`/openai/v1`).
- SSE streaming relays faithfully; the terminal `response.completed` event carries `usage`.
- The `custom_host` is honored and the `/responses` path preserved (`src/handlers/services/providerContext.ts` `getFullURL`; `src/providers/openai/api.ts` `getEndpoint`).

## Consequences

**Positive**: No fork, no Converse-translation maintenance burden; the flagship lane keeps gateway hooks, logging, and isolation. Retires the bypass and fork options.

**Negative / caveats**:

- The gateway holds a Bedrock API key as a static bearer (Portkey OSS has no refreshing-bearer loop for the `openai` provider) -- use a long-lived key or an external rotator.
- OSS does not parse `usage` on **streamed** responses; per-stream cost attribution needs a small `afterRequestHook` / stream tee (additive, not a fork).
- Because the lane uses the `openai` provider, the `custom_host` **must be pinned** and caller-supplied `custom_host` overrides rejected, or the gateway can route prompts to `api.openai.com` (the egress hole was demonstrated live). Egress to OpenAI SaaS is denied at the VPC as defense-in-depth.

## Supersedes / amends

Amends **ADR-006**: `/v1/responses` is served by the gateway for the `openai` provider, and the OpenAI-on-Bedrock lane uses that provider with a mantle `custom_host` -- Responses traffic flows **through** the gateway, not around it. The earlier "bypass the gateway" and "fork a `bedrock-responses` provider" framings are retracted.

## Sources

- AWS -- Get started with OpenAI GPT-5.5/GPT-5.4 and Codex on Amazon Bedrock; Bedrock mantle inference docs.
- Portkey OSS `src/handlers/services/providerContext.ts`, `src/providers/openai/api.ts`, `src/handlers/modelResponsesHandler.ts` (verified at commit `669825c`).
- Local spike `~/workplace/portkey-mantle-spike` (2026-06-11).