Skip to content

Architecture Decision Records

Architecture Decision Records (ADRs) capture significant technical decisions along with their context, alternatives considered, and consequences. They serve as a historical record so that future contributors understand why a particular approach was chosen, not just what was built.

ADRs are stored in the adr/ directory at the repository root.

ADRTitleStatusSummary
001Portkey OSS as LLM Gateway ProxySuperseded by ADR-017Selected Portkey OSS over LiteLLM due to LiteLLM’s 14 CVEs (including critical RCE and active SSRF exploitation), systemic memory leaks, and 800 MB+ image size. The data plane is now agentgateway (ADR-017); this remains as the historical proxy-selection record.
002python:3.13-slim Over ChainguardAcceptedChose python:3.13-slim with multi-stage hardening over Chainguard because the free Chainguard tier locks to latest (Python 3.14), and the 3.13 tag requires a paid subscription.
003Single NAT Gateway + VPC EndpointsAcceptedDeployed one NAT Gateway instead of two, combined with VPC endpoints for ECR, CloudWatch, Secrets Manager, and S3. Saves approximately $32/month with acceptable HA trade-off for outbound internet traffic.
0043-Phase Container Security PipelineAcceptedStructured the security pipeline into three phases: pre-build (hadolint + checkov), post-build (trivy + syft), and post-scan (cosign). Skipped grype (trivy covers it) and osv-scanner (uv audit provides native OSV scanning).
005ALB JWT Validation Over API GatewayProposedUses ALB-native validate_token action (launched Nov 2025) instead of API Gateway HTTP API for JWT authentication. Saves $260-2,400/month depending on request volume with zero additional latency.
006Dual-Format APIAcceptedBoth OpenAI Chat Completions (/v1/chat/completions) and Anthropic Messages (/v1/messages) are served natively on a single port, with no custom middleware or translation layer. Still true under agentgateway (ADR-017), which selects the route type from the path suffix.
007AWS Provider Upgrade to >= 6.22AcceptedUpgraded the Terraform AWS provider from ~> 5.0 to ~> 6.22 to enable the ALB JWT validation resource (jwt_validation block in aws_lb_listener). Zero-risk upgrade since infrastructure was deployed fresh on v6.
008Multi-Tenant Client IsolationAcceptedPer-team Cognito app clients via a clients Terraform module. Each team gets isolated credentials, scopes, and usage tracking.
009Provider Routing StrategyAcceptedGateway-native provider-level fallback and load-balance strategies. Under ADR-017 this is realized as agentgateway ai.groups priority-group failover in the rendered config (no x-portkey-config header).
010Cost Attribution PipelineAcceptedLambda subscribes to gateway CloudWatch logs, extracts token usage, computes estimated cost, and publishes custom CloudWatch metrics per team.
011Bedrock Guardrails IntegrationAcceptedTerraform module for Amazon Bedrock Guardrails with configurable content filtering, PII blocking, topic denial, and word filtering policies.
012Response Cache StrategySuperseded by ADR-017ElastiCache Redis cluster for exact-match response caching. Removed under ADR-017 in favor of provider-native prompt caching (Bedrock cachePoint markers); there is no response cache.
013Identity Center SAML/OIDC FederationProposedSAML 2.0 and OIDC federation with the Cognito User Pool, plus a Pre-Token-Generation V2 Lambda for IdP group-to-claim mapping.
014Two-Plane Architecture SplitAcceptedALB stays on the inference path; admin APIs (teams, budgets, routing, pricing, usage) move behind API Gateway with a Cognito authorizer. Eliminates duplicated JWT validation across handlers.
015OpenAI Responses → Bedrock mantle proxyAcceptedCodex + GPT-5.5/5.4 (Responses-only) route through the gateway via the openai provider with a custom_host pointed at the Bedrock mantle endpoint — a proxy, not a fork or a bypass. Amends ADR-006; retracts the earlier “bypass the gateway / fork a bedrock-responses provider” framing.
016Control-Plane API Foundation (gwcore)AcceptedA shared src/gwcore/ package gives all control-plane Lambdas one auth path (two verification modes, one Principal), a consistent response/error/cursor-pagination contract, in-process + ETag caching, an append-only audit trail (Firehose → Iceberg), and uniform EMF + structured-log observability. Closes the divergent-scope auth bug surfaced under ADR-014; handlers migrate incrementally.
017agentgateway as the data plane (replaces Portkey OSS)AcceptedReplaces the Portkey OSS build with agentgateway (Rust, distroless, pinned by image digest). Routing moves into the rendered config (ai.groups failover), content safety runs inline via Bedrock Guardrails (ApplyGuardrail, detect-only by default), the budget webhook speaks agentgateway’s {action} contract, and the response cache is dropped for provider-native prompt caching. Supersedes ADR-001 and ADR-012.

ADR files follow the pattern:

adr/NNN-short-descriptive-title.md

Where NNN is a zero-padded sequential number (e.g., 008).

Use this template for new ADRs:

# ADR-NNN: Title
**Status**: Proposed | Accepted | Deprecated | Superseded by ADR-XXX
**Date**: YYYY-MM-DD
**Deciders**: AI Engineering NAMER
## Context
What is the issue that we're seeing that is motivating this decision or change?
## Decision
What is the change that we're proposing and/or doing?
## Alternatives Considered
| Criteria | Option A | Option B | Option C |
|----------|----------|----------|----------|
| ... | ... | ... | ... |
## Rationale
Why was this option chosen over the alternatives?
## Consequences
**Positive**: What becomes easier or possible as a result?
**Negative**: What becomes harder or is introduced as a trade-off?
  1. Copy the template above into a new file: adr/NNN-your-title.md.
  2. Set the status to Proposed.
  3. Fill in context, decision, alternatives, rationale, and consequences.
  4. Open a PR. Discussion happens in the PR review.
  5. Once approved and merged, update the status to Accepted.