Skip to content

ADR-009: Provider Routing Strategy

Status: Accepted Date: 2026-03-20 Deciders: AI Engineering NAMER

The AI Gateway currently sends each request to a single LLM provider with no automatic failover. If the primary provider experiences an outage, rate-limits the request (HTTP 429), or returns a server error (5xx), the client receives the error directly and must handle retries itself.

This creates fragility in production workloads:

  • Bedrock throttling: During high-demand periods, Bedrock returns 429s that propagate to callers.
  • Provider outages: A single-provider architecture means any provider downtime is a full outage for models served by that provider.
  • No client-side retry standardization: Each client team implements their own retry/fallback logic, leading to inconsistent behavior and duplicated effort.

Use Portkey’s native routing engine to implement provider-level fallback and load-balance strategies. Routing configs are JSON objects passed via the x-portkey-config header (base64-encoded) or injected as default environment variables in the gateway container.

  1. Single (default, current behavior): One provider per request, determined by x-portkey-provider header.
  2. Fallback: Ordered list of providers. On qualifying errors (429, 5xx), the gateway tries the next provider automatically. Each target can have per-provider retry settings.
  3. Load balance: Weighted distribution across providers. Useful for quota spreading or cost optimization.
ConfigPrimaryFallbackUse case
fallback-anthropic.jsonBedrockAnthropic direct APIAnthropic models with Bedrock-first routing
fallback-openai.jsonOpenAIAzure OpenAIOpenAI models with Azure fallback
loadbalance-multi.jsonBedrock (60%)Anthropic (40%)Distribute Anthropic traffic across providers

When enable_provider_fallback is set to true, Terraform injects the fallback configs as base64-encoded environment variables in the ECS task definition. Clients can also override per-request via the x-portkey-config header.

Build a custom routing layer (e.g., Envoy, nginx, or a Python service) in front of the Portkey gateway.

  • Pro: Full control over routing logic.
  • Con: Significant development and operational overhead. Another service to deploy, monitor, and maintain. Duplicates functionality that Portkey already provides natively.

Option 2: API Gateway routing rules (rejected)

Section titled “Option 2: API Gateway routing rules (rejected)”

Use AWS API Gateway with Lambda authorizers to implement provider fallback at the edge.

  • Pro: AWS-native, integrates with existing infrastructure.
  • Con: Adds latency (Lambda cold starts). Limited retry logic. Cannot inspect LLM-specific response codes easily. Does not support weighted load balancing across providers.

Option 3: Portkey native routing (accepted)

Section titled “Option 3: Portkey native routing (accepted)”

Use Portkey’s built-in strategy configuration with fallback and loadbalance modes.

  • Pro: Zero additional infrastructure. Zero code changes. Battle-tested routing logic. Per-target retry configuration. Transparent to clients (they can still use the standard OpenAI-compatible API).
  • Con: Vendor-specific config format (JSON schema tied to Portkey). If we migrate away from Portkey, these configs would need rewriting.
  • Improved resilience: Automatic failover from Bedrock to direct API (or vice versa) on provider errors.
  • Zero code changes: All routing is config-driven. No application code modifications needed.
  • Client simplicity: Clients no longer need their own retry/fallback logic for provider-level failures.
  • Incremental rollout: enable_provider_fallback defaults to false, so existing deployments are unaffected until opted in.
  • Portkey lock-in: Routing configs use Portkey’s proprietary JSON format. Migration to another gateway would require rewriting these configs.
  • Config complexity: Teams need to understand the routing config schema to create custom strategies.
  • Cost implications: Fallback to direct API providers (Anthropic, OpenAI) may incur different pricing than Bedrock. Teams should be aware of cost differences between providers.
  • Observability: Portkey logs which provider served each request, so fallback events are visible in existing OTel traces.
  • API key management: All provider API keys are already provisioned in Secrets Manager. No new secrets are needed.