Architecture Decision Records

What Are ADRs

Architecture Decision Records (ADRs) capture significant technical decisions along with their context, alternatives considered, and consequences. They serve as a historical record so that future contributors understand why a particular approach was chosen, not just what was built.

ADRs are stored in the adr/ directory at the repository root.

Decision Log

ADR	Title	Status	Summary
001	Portkey OSS as LLM Gateway Proxy	Superseded by ADR-017	Selected Portkey OSS over LiteLLM due to LiteLLM’s 14 CVEs (including critical RCE and active SSRF exploitation), systemic memory leaks, and 800 MB+ image size. The data plane is now agentgateway (ADR-017); this remains as the historical proxy-selection record.
002	python:3.13-slim Over Chainguard	Accepted	Chose `python:3.13-slim` with multi-stage hardening over Chainguard because the free Chainguard tier locks to `latest` (Python 3.14), and the 3.13 tag requires a paid subscription.
003	Single NAT Gateway + VPC Endpoints	Accepted	Deployed one NAT Gateway instead of two, combined with VPC endpoints for ECR, CloudWatch, Secrets Manager, and S3. Saves approximately $32/month with acceptable HA trade-off for outbound internet traffic.
004	3-Phase Container Security Pipeline	Accepted	Structured the security pipeline into three phases: pre-build (hadolint + checkov), post-build (trivy + syft), and post-scan (cosign). Skipped grype (trivy covers it) and osv-scanner (`uv audit` provides native OSV scanning).
005	ALB JWT Validation Over API Gateway	Proposed	Uses ALB-native `validate_token` action (launched Nov 2025) instead of API Gateway HTTP API for JWT authentication. Saves $260-2,400/month depending on request volume with zero additional latency.
006	Dual-Format API	Accepted	Both OpenAI Chat Completions (`/v1/chat/completions`) and Anthropic Messages (`/v1/messages`) are served natively on a single port, with no custom middleware or translation layer. Still true under agentgateway (ADR-017), which selects the route type from the path suffix.
007	AWS Provider Upgrade to >= 6.22	Accepted	Upgraded the Terraform AWS provider from `~> 5.0` to `~> 6.22` to enable the ALB JWT validation resource (`jwt_validation` block in `aws_lb_listener`). Zero-risk upgrade since infrastructure was deployed fresh on v6.
008	Multi-Tenant Client Isolation	Accepted	Per-team Cognito app clients via a `clients` Terraform module. Each team gets isolated credentials, scopes, and usage tracking.
009	Provider Routing Strategy	Accepted	Gateway-native provider-level fallback and load-balance strategies. Under ADR-017 this is realized as agentgateway `ai.groups` priority-group failover in the rendered config (no `x-portkey-config` header).
010	Cost Attribution Pipeline	Accepted	Lambda subscribes to gateway CloudWatch logs, extracts token usage, computes estimated cost, and publishes custom CloudWatch metrics per team.
011	Bedrock Guardrails Integration	Accepted	Terraform module for Amazon Bedrock Guardrails with configurable content filtering, PII blocking, topic denial, and word filtering policies.
012	Response Cache Strategy	Superseded by ADR-017	ElastiCache Redis cluster for exact-match response caching. Removed under ADR-017 in favor of provider-native prompt caching (Bedrock `cachePoint` markers); there is no response cache.
013	Identity Center SAML/OIDC Federation	Proposed	SAML 2.0 and OIDC federation with the Cognito User Pool, plus a Pre-Token-Generation V2 Lambda for IdP group-to-claim mapping.
014	Two-Plane Architecture Split	Accepted	ALB stays on the inference path; admin APIs (teams, budgets, routing, pricing, usage) move behind API Gateway with a Cognito authorizer. Eliminates duplicated JWT validation across handlers.
015	OpenAI Responses → Bedrock mantle proxy	Accepted	Codex + GPT-5.5/5.4 (Responses-only) route through the gateway via the `openai` provider with a `custom_host` pointed at the Bedrock mantle endpoint — a proxy, not a fork or a bypass. Amends ADR-006; retracts the earlier “bypass the gateway / fork a `bedrock-responses` provider” framing.
016	Control-Plane API Foundation (`gwcore`)	Accepted	A shared `src/gwcore/` package gives all control-plane Lambdas one auth path (two verification modes, one `Principal`), a consistent response/error/cursor-pagination contract, in-process + ETag caching, an append-only audit trail (Firehose → Iceberg), and uniform EMF + structured-log observability. Closes the divergent-scope auth bug surfaced under ADR-014; handlers migrate incrementally.
017	agentgateway as the data plane (replaces Portkey OSS)	Accepted	Replaces the Portkey OSS build with agentgateway (Rust, distroless, pinned by image digest). Routing moves into the rendered config (`ai.groups` failover), content safety runs inline via Bedrock Guardrails (`ApplyGuardrail`, detect-only by default), the budget webhook speaks agentgateway’s `{action}` contract, and the response cache is dropped for provider-native prompt caching. Supersedes ADR-001 and ADR-012.

Creating a New ADR

Naming Convention

ADR files follow the pattern:

adr/NNN-short-descriptive-title.md

Where NNN is a zero-padded sequential number (e.g., 008).

Template

Use this template for new ADRs:

# ADR-NNN: Title

**Status**: Proposed | Accepted | Deprecated | Superseded by ADR-XXX
**Date**: YYYY-MM-DD
**Deciders**: AI Engineering NAMER

## Context

What is the issue that we're seeing that is motivating this decision or change?

## Decision

What is the change that we're proposing and/or doing?

## Alternatives Considered

| Criteria | Option A | Option B | Option C |
|----------|----------|----------|----------|
| ... | ... | ... | ... |

## Rationale

Why was this option chosen over the alternatives?

## Consequences

**Positive**: What becomes easier or possible as a result?

**Negative**: What becomes harder or is introduced as a trade-off?

Process

Copy the template above into a new file: adr/NNN-your-title.md.
Set the status to Proposed.
Fill in context, decision, alternatives, rationale, and consequences.
Open a PR. Discussion happens in the PR review.
Once approved and merged, update the status to Accepted.