Architecture Decision Records
What Are ADRs
Section titled “What Are ADRs”Architecture Decision Records (ADRs) capture significant technical decisions along with their context, alternatives considered, and consequences. They serve as a historical record so that future contributors understand why a particular approach was chosen, not just what was built.
ADRs are stored in the adr/ directory at the repository root.
Decision Log
Section titled “Decision Log”| ADR | Title | Status | Summary |
|---|---|---|---|
| 001 | Portkey OSS as LLM Gateway Proxy | Superseded by ADR-017 | Selected Portkey OSS over LiteLLM due to LiteLLM’s 14 CVEs (including critical RCE and active SSRF exploitation), systemic memory leaks, and 800 MB+ image size. The data plane is now agentgateway (ADR-017); this remains as the historical proxy-selection record. |
| 002 | python:3.13-slim Over Chainguard | Accepted | Chose python:3.13-slim with multi-stage hardening over Chainguard because the free Chainguard tier locks to latest (Python 3.14), and the 3.13 tag requires a paid subscription. |
| 003 | Single NAT Gateway + VPC Endpoints | Accepted | Deployed one NAT Gateway instead of two, combined with VPC endpoints for ECR, CloudWatch, Secrets Manager, and S3. Saves approximately $32/month with acceptable HA trade-off for outbound internet traffic. |
| 004 | 3-Phase Container Security Pipeline | Accepted | Structured the security pipeline into three phases: pre-build (hadolint + checkov), post-build (trivy + syft), and post-scan (cosign). Skipped grype (trivy covers it) and osv-scanner (uv audit provides native OSV scanning). |
| 005 | ALB JWT Validation Over API Gateway | Proposed | Uses ALB-native validate_token action (launched Nov 2025) instead of API Gateway HTTP API for JWT authentication. Saves $260-2,400/month depending on request volume with zero additional latency. |
| 006 | Dual-Format API | Accepted | Both OpenAI Chat Completions (/v1/chat/completions) and Anthropic Messages (/v1/messages) are served natively on a single port, with no custom middleware or translation layer. Still true under agentgateway (ADR-017), which selects the route type from the path suffix. |
| 007 | AWS Provider Upgrade to >= 6.22 | Accepted | Upgraded the Terraform AWS provider from ~> 5.0 to ~> 6.22 to enable the ALB JWT validation resource (jwt_validation block in aws_lb_listener). Zero-risk upgrade since infrastructure was deployed fresh on v6. |
| 008 | Multi-Tenant Client Isolation | Accepted | Per-team Cognito app clients via a clients Terraform module. Each team gets isolated credentials, scopes, and usage tracking. |
| 009 | Provider Routing Strategy | Accepted | Gateway-native provider-level fallback and load-balance strategies. Under ADR-017 this is realized as agentgateway ai.groups priority-group failover in the rendered config (no x-portkey-config header). |
| 010 | Cost Attribution Pipeline | Accepted | Lambda subscribes to gateway CloudWatch logs, extracts token usage, computes estimated cost, and publishes custom CloudWatch metrics per team. |
| 011 | Bedrock Guardrails Integration | Accepted | Terraform module for Amazon Bedrock Guardrails with configurable content filtering, PII blocking, topic denial, and word filtering policies. |
| 012 | Response Cache Strategy | Superseded by ADR-017 | ElastiCache Redis cluster for exact-match response caching. Removed under ADR-017 in favor of provider-native prompt caching (Bedrock cachePoint markers); there is no response cache. |
| 013 | Identity Center SAML/OIDC Federation | Proposed | SAML 2.0 and OIDC federation with the Cognito User Pool, plus a Pre-Token-Generation V2 Lambda for IdP group-to-claim mapping. |
| 014 | Two-Plane Architecture Split | Accepted | ALB stays on the inference path; admin APIs (teams, budgets, routing, pricing, usage) move behind API Gateway with a Cognito authorizer. Eliminates duplicated JWT validation across handlers. |
| 015 | OpenAI Responses → Bedrock mantle proxy | Accepted | Codex + GPT-5.5/5.4 (Responses-only) route through the gateway via the openai provider with a custom_host pointed at the Bedrock mantle endpoint — a proxy, not a fork or a bypass. Amends ADR-006; retracts the earlier “bypass the gateway / fork a bedrock-responses provider” framing. |
| 016 | Control-Plane API Foundation (gwcore) | Accepted | A shared src/gwcore/ package gives all control-plane Lambdas one auth path (two verification modes, one Principal), a consistent response/error/cursor-pagination contract, in-process + ETag caching, an append-only audit trail (Firehose → Iceberg), and uniform EMF + structured-log observability. Closes the divergent-scope auth bug surfaced under ADR-014; handlers migrate incrementally. |
| 017 | agentgateway as the data plane (replaces Portkey OSS) | Accepted | Replaces the Portkey OSS build with agentgateway (Rust, distroless, pinned by image digest). Routing moves into the rendered config (ai.groups failover), content safety runs inline via Bedrock Guardrails (ApplyGuardrail, detect-only by default), the budget webhook speaks agentgateway’s {action} contract, and the response cache is dropped for provider-native prompt caching. Supersedes ADR-001 and ADR-012. |
Creating a New ADR
Section titled “Creating a New ADR”Naming Convention
Section titled “Naming Convention”ADR files follow the pattern:
adr/NNN-short-descriptive-title.mdWhere NNN is a zero-padded sequential number (e.g., 008).
Template
Section titled “Template”Use this template for new ADRs:
# ADR-NNN: Title
**Status**: Proposed | Accepted | Deprecated | Superseded by ADR-XXX**Date**: YYYY-MM-DD**Deciders**: AI Engineering NAMER
## Context
What is the issue that we're seeing that is motivating this decision or change?
## Decision
What is the change that we're proposing and/or doing?
## Alternatives Considered
| Criteria | Option A | Option B | Option C ||----------|----------|----------|----------|| ... | ... | ... | ... |
## Rationale
Why was this option chosen over the alternatives?
## Consequences
**Positive**: What becomes easier or possible as a result?
**Negative**: What becomes harder or is introduced as a trade-off?Process
Section titled “Process”- Copy the template above into a new file:
adr/NNN-your-title.md. - Set the status to
Proposed. - Fill in context, decision, alternatives, rationale, and consequences.
- Open a PR. Discussion happens in the PR review.
- Once approved and merged, update the status to
Accepted.