Skip to content

Solutions for common issues when using the AI Gateway.


The ALB rejected the request because the JWT is invalid or missing.

Possible causes:

  • Expired token — Cognito JWTs have a 1-hour TTL. Re-run the token script:
    Terminal window
    TOKEN=$(./scripts/get-gateway-token.sh)
  • Missing apiKeyHelper (Claude Code) — Ensure the helper is set and the script is executable:
    Terminal window
    claude config set --global apiKeyHelper ~/workplace/ai-gateway/scripts/get-gateway-token.sh
    chmod +x ~/workplace/ai-gateway/scripts/get-gateway-token.sh
  • Invalid JWT — Verify the token is well-formed:
    Terminal window
    echo "$TOKEN" | cut -d. -f2 | base64 -d 2>/dev/null | python3 -m json.tool
  • TTL not set as env var (Claude Code) — CLAUDE_CODE_API_KEY_HELPER_TTL_MS must be a real environment variable, not set in the settings.json env block (bug #7660).

The request was authenticated but rejected by authorization or WAF rules.

Possible causes:

  • Wrong scope — The Cognito app client may not have the required https://gateway.internal/invoke scope. Contact the gateway admin to verify client configuration.
  • WAF block — AWS WAF may have blocked the request. Check the x-amzn-waf-action response header.
  • IP rate limit exceeded — WAF enforces a 2,000 requests/5-min per-IP limit. Wait and retry, or contact the admin for a higher threshold.

The gateway or upstream provider is rate-limiting your requests.

Possible causes:

  • Budget exceeded — Your team’s token budget has been exhausted for the current period. Check your budget status:

    Terminal window
    # Decode your token to see team/client claims
    echo "$TOKEN" | cut -d. -f2 | base64 -d 2>/dev/null | python3 -m json.tool

    Contact the gateway admin to check remaining budget or request an increase.

  • Provider rate limit — The upstream LLM provider (OpenAI, Anthropic, etc.) is rate-limiting requests. This is typically transient.

    • Wait 10-30 seconds and retry.
    • If persistent, the gateway may need higher provider-side rate limits.
  • WAF rate limit — AWS WAF enforces a per-IP request limit (2,000 requests per 5 minutes). If you are sending high-volume automated requests, you may hit this limit.

    • Spread requests over time or contact the admin for a higher threshold.

Retry strategy: Implement exponential backoff. Most 429 responses include a Retry-After header with the number of seconds to wait.


The gateway could not reach the upstream LLM provider.

Possible causes:

  • Provider outage — The upstream provider (OpenAI, Anthropic, Google, etc.) may be experiencing downtime. Check the provider’s status page.
  • Invalid provider API key — The gateway’s stored API key for the provider may be expired or revoked. Contact the gateway admin.
  • Network issue — The ECS task may not have outbound internet access. Check NAT Gateway and VPC endpoint configuration.

What to try:

  1. Test a different provider to isolate the issue:
    Terminal window
    # Try anthropic instead of openai, or vice versa
    curl -H "Authorization: Bearer $TOKEN" \
    -H "x-portkey-provider: anthropic" \
    -H "Content-Type: application/json" \
    -d '{"model":"claude-sonnet-4-20250514","max_tokens":1,"messages":[{"role":"user","content":"hi"}]}' \
    ${GATEWAY_URL}/v1/chat/completions
  2. Run the health check with provider testing:
    Terminal window
    TOKEN="$TOKEN" ./scripts/check-health.sh --url "$GATEWAY_URL" --token "$TOKEN" --providers

The gateway itself is overloaded or unhealthy.

Possible causes:

  • All ECS tasks unhealthy — The ALB returns 503 when no healthy targets are available. Check ECS service events in the AWS console.
  • Gateway overloaded — Too many concurrent requests for the current capacity. The auto-scaling policy should add more tasks, but there is a ramp-up delay.
  • Deployment in progress — A rolling deployment may temporarily reduce capacity.

What to try:

  1. Wait 30-60 seconds and retry. Auto-scaling should recover.
  2. Run the basic health check:
    Terminal window
    ./scripts/check-health.sh --url "$GATEWAY_URL"
  3. If persistent, contact the gateway admin to check ECS service health and scaling configuration.

Cognito JWTs expire after 1 hour. Here is how to refresh for each agent type.

Claude Code handles token refresh automatically. The apiKeyHelper script is re-invoked:

  • Proactively, based on CLAUDE_CODE_API_KEY_HELPER_TTL_MS (recommended: 3000000 = 50 minutes)
  • Reactively, on any 401 response

No manual intervention is needed. If token refresh is failing, check that the helper script is executable and that your M2M credentials are still valid:

Terminal window
chmod +x ~/workplace/ai-gateway/scripts/get-gateway-token.sh
./scripts/get-gateway-token.sh # Should print a JWT

For agents that read tokens from environment variables (OpenCode, Goose, Codex CLI, LangChain), tokens do not auto-refresh. Options:

  1. Re-export the variable when the token expires:

    Terminal window
    export OPENAI_API_KEY=$(./scripts/get-gateway-token.sh)
  2. Use the caching wrapper to auto-refresh (recommended). See Authentication — Shell Wrapper Pattern.

  3. Restart the agent after exporting a fresh token, since some agents read the env var only at startup.

Use the health check script to see when your token expires:

Terminal window
TOKEN=$(./scripts/get-gateway-token.sh)
TOKEN="$TOKEN" ./scripts/check-health.sh --url "$GATEWAY_URL" --token "$TOKEN"

Or decode manually:

Terminal window
echo "$TOKEN" | cut -d. -f2 | base64 -d 2>/dev/null | python3 -c "
import json, sys, datetime
data = json.load(sys.stdin)
exp = data.get('exp', 0)
remaining = exp - int(datetime.datetime.now().timestamp())
print(f'Expires in {remaining // 60} minutes ({datetime.datetime.fromtimestamp(exp)})')
"

The gateway URL is unreachable.

Possible causes:

  • Wrong gateway URL — Verify the URL is correct:
    Terminal window
    curl -s -o /dev/null -w "%{http_code}" ${GATEWAY_URL}/
    Expect 200. If you get no response, the URL is wrong or the gateway is down.
  • VPN required — If the ALB is in a private network, confirm your VPN is connected.
  • Unhealthy ALB target — The ALB health check path is / on port 8787. If all targets are unhealthy, the ALB returns 503. Check ECS service events in the AWS console.

{"error": "provider is not set"}

Every request must include the x-portkey-provider header. Verify per agent:

Check that ANTHROPIC_CUSTOM_HEADERS is set (newline-separated format):

Terminal window
echo "$ANTHROPIC_CUSTOM_HEADERS"
# Should output: x-portkey-provider: anthropic

{"error": "invalid_grant"}

The Cognito token request failed.

Possible causes:

  • Wrong token endpoint — Verify GATEWAY_TOKEN_ENDPOINT is the full URL:
    https://<domain>.auth.<region>.amazoncognito.com/oauth2/token
  • Wrong credentials — Confirm GATEWAY_CLIENT_ID and GATEWAY_CLIENT_SECRET are correct.
  • Wrong grant type — The Cognito app client must be configured for client_credentials grant type. Contact the gateway admin if this is not set.

When connected to a non-first-party host, Claude Code disables MCP tool search by default. Symptoms: tool search returns no results, or tools from MCP servers are not discovered.

Fix:

Terminal window
export ENABLE_TOOL_SEARCH=true

Add this to the same shell profile block as your other gateway environment variables.


Requests are going to api.openai.com instead of the gateway.

Do not use the built-in openai provider name. Codex CLI does not allow overriding it. Use a custom provider name:

[model_providers.gateway]
name = "AI Gateway"
base_url = "${GATEWAY_URL}/v1"

Launch with: codex --provider gateway --model gpt-4.1


When environment variables and config files conflict, the following precedence applies:

AgentPrecedence (highest first)
Claude CodeEnv vars > claude config settings > defaults
OpenCodeopencode.json in project root > global config
GooseEnv vars > ~/.config/goose/config.yaml
Continue.dev~/.continue/config.yaml (only source)
LangChainConstructor args > env vars
Codex CLICLI flags > config.toml > env vars