ADR-009: Provider Routing Strategy

Status: Accepted Date: 2026-03-20 Deciders: AI Engineering NAMER

Context

The AI Gateway currently sends each request to a single LLM provider with no automatic failover. If the primary provider experiences an outage, rate-limits the request (HTTP 429), or returns a server error (5xx), the client receives the error directly and must handle retries itself.

This creates fragility in production workloads:

Bedrock throttling: During high-demand periods, Bedrock returns 429s that propagate to callers.
Provider outages: A single-provider architecture means any provider downtime is a full outage for models served by that provider.
No client-side retry standardization: Each client team implements their own retry/fallback logic, leading to inconsistent behavior and duplicated effort.

Decision

Use Portkey’s native routing engine to implement provider-level fallback and load-balance strategies. Routing configs are JSON objects passed via the x-portkey-config header (base64-encoded) or injected as default environment variables in the gateway container.

Routing modes

Single (default, current behavior): One provider per request, determined by x-portkey-provider header.
Fallback: Ordered list of providers. On qualifying errors (429, 5xx), the gateway tries the next provider automatically. Each target can have per-provider retry settings.
Load balance: Weighted distribution across providers. Useful for quota spreading or cost optimization.

Pre-built configs

Config	Primary	Fallback	Use case
`fallback-anthropic.json`	Bedrock	Anthropic direct API	Anthropic models with Bedrock-first routing
`fallback-openai.json`	OpenAI	Azure OpenAI	OpenAI models with Azure fallback
`loadbalance-multi.json`	Bedrock (60%)	Anthropic (40%)	Distribute Anthropic traffic across providers

Infrastructure integration

When enable_provider_fallback is set to true, Terraform injects the fallback configs as base64-encoded environment variables in the ECS task definition. Clients can also override per-request via the x-portkey-config header.

Options Considered

Option 1: Custom reverse proxy (rejected)

Build a custom routing layer (e.g., Envoy, nginx, or a Python service) in front of the Portkey gateway.

Pro: Full control over routing logic.
Con: Significant development and operational overhead. Another service to deploy, monitor, and maintain. Duplicates functionality that Portkey already provides natively.

Option 2: API Gateway routing rules (rejected)

Use AWS API Gateway with Lambda authorizers to implement provider fallback at the edge.

Pro: AWS-native, integrates with existing infrastructure.
Con: Adds latency (Lambda cold starts). Limited retry logic. Cannot inspect LLM-specific response codes easily. Does not support weighted load balancing across providers.

Option 3: Portkey native routing (accepted)

Use Portkey’s built-in strategy configuration with fallback and loadbalance modes.

Pro: Zero additional infrastructure. Zero code changes. Battle-tested routing logic. Per-target retry configuration. Transparent to clients (they can still use the standard OpenAI-compatible API).
Con: Vendor-specific config format (JSON schema tied to Portkey). If we migrate away from Portkey, these configs would need rewriting.

Consequences

Positive

Improved resilience: Automatic failover from Bedrock to direct API (or vice versa) on provider errors.
Zero code changes: All routing is config-driven. No application code modifications needed.
Client simplicity: Clients no longer need their own retry/fallback logic for provider-level failures.
Incremental rollout: enable_provider_fallback defaults to false, so existing deployments are unaffected until opted in.

Negative

Portkey lock-in: Routing configs use Portkey’s proprietary JSON format. Migration to another gateway would require rewriting these configs.
Config complexity: Teams need to understand the routing config schema to create custom strategies.
Cost implications: Fallback to direct API providers (Anthropic, OpenAI) may incur different pricing than Bedrock. Teams should be aware of cost differences between providers.

Neutral

Observability: Portkey logs which provider served each request, so fallback events are visible in existing OTel traces.
API key management: All provider API keys are already provisioned in Secrets Manager. No new secrets are needed.