Authentication
The AI Gateway uses Cognito Machine-to-Machine (M2M) authentication with the client_credentials OAuth2 grant. The ALB validates JWTs natively — no API Gateway or custom authorizer required.
How It Works
Section titled “How It Works”Cognito M2M authentication is designed for service-to-service communication where no human login is involved. Each client (agent, script, or service) is issued a client ID and client secret, which it exchanges for a short-lived JWT access token.
The ALB performs JWT validation at the edge: it checks the token signature against Cognito’s JWKS endpoint, verifies iss, exp, nbf, iat, and the required scope claim. Invalid or expired tokens receive a 401 Unauthorized response directly from the ALB — the request never reaches the gateway container.
Auth Flow
Section titled “Auth Flow”sequenceDiagram
participant Client as AI Agent / Script
participant Cognito as Cognito User Pool
participant ALB as Application Load Balancer
participant ECS as Portkey Gateway (ECS)
Client->>Cognito: POST /oauth2/token<br/>grant_type=client_credentials<br/>client_id + client_secret
Cognito-->>Client: 200 OK<br/>access_token (JWT, 1h TTL)
Client->>ALB: POST /v1/messages<br/>Authorization: Bearer JWT<br/>x-portkey-provider: anthropic
ALB->>ALB: Validate JWT signature (JWKS)<br/>Check iss, exp, scope
alt Token valid
ALB->>ECS: Forward request
ECS-->>ALB: LLM response
ALB-->>Client: 200 OK (response)
else Token invalid or expired
ALB-->>Client: 401 Unauthorized
end
Step-by-Step
Section titled “Step-by-Step”- Token request — The client sends a
POSTto the Cognito/oauth2/tokenendpoint withgrant_type=client_credentials, providing the client ID and secret. - Token issuance — Cognito validates the credentials and returns a signed JWT access token with a 1-hour TTL and the
https://gateway.internal/invokescope. - Request with Bearer token — The client includes the JWT in the
Authorization: Bearer <token>header on every request to the gateway. - ALB JWT validation — The ALB validates the token signature against Cognito’s JWKS endpoint, checks standard claims (
iss,exp,nbf,iat), and verifies the required scope. Invalid tokens are rejected with a401before reaching the backend. - Forward to ECS — Valid requests pass through to the Portkey gateway container on ECS Fargate.
Getting a Token
Section titled “Getting a Token”Use the provided script to obtain a token from the command line.
Required Environment Variables
Section titled “Required Environment Variables”| Variable | Description |
|---|---|
GATEWAY_CLIENT_ID | Cognito app-client ID (issued per team or service account) |
GATEWAY_CLIENT_SECRET | Cognito app-client secret |
GATEWAY_TOKEN_ENDPOINT | Full Cognito token URL, e.g. https://<domain>.auth.<region>.amazoncognito.com/oauth2/token |
Set these in your shell profile (~/.zshrc or ~/.bashrc):
export GATEWAY_CLIENT_ID="<your-client-id>"export GATEWAY_CLIENT_SECRET="<your-client-secret>"export GATEWAY_TOKEN_ENDPOINT="<cognito-token-endpoint>"Using the Script
Section titled “Using the Script”# Fetch a token (raw JWT printed to stdout)TOKEN=$(./scripts/get-gateway-token.sh)
# Verify the token payloadecho "$TOKEN" | cut -d. -f2 | base64 -d 2>/dev/null | python3 -m json.toolThe script exits non-zero on failure and writes diagnostics to stderr:
| Exit Code | Meaning |
|---|---|
0 | Success |
1 | Missing environment variable |
2 | Token request failed (curl error or non-200 HTTP) |
3 | JSON parsing failed or access_token missing in response |
Token Caching
Section titled “Token Caching”Cognito JWTs have a default TTL of 3600 seconds (1 hour). To avoid redundant token requests, cache the token and refresh before expiry.
Claude Code
Section titled “Claude Code”Claude Code handles caching automatically via apiKeyHelper:
apiKeyHelperis called once at startup to fetch the token.CLAUDE_CODE_API_KEY_HELPER_TTL_MS=3000000(50 minutes) tells Claude Code to re-invoke the helper before the token expires.- On a
401response, Claude Code immediately re-invokes the helper regardless of TTL.
No additional caching is needed.
Shell Wrapper Pattern (Other Agents)
Section titled “Shell Wrapper Pattern (Other Agents)”For agents that read API keys from environment variables (Goose, OpenCode, Codex CLI), use a caching wrapper:
#!/usr/bin/env bash## Caches the gateway token in a file, refreshing when older than 50 minutes.
set -euo pipefail
CACHE_FILE="${HOME}/.cache/ai-gateway/token"MAX_AGE=3000 # seconds (50 minutes)
mkdir -p "$(dirname "$CACHE_FILE")"
# Refresh if cache is missing or staleif [[ ! -f "$CACHE_FILE" ]] || \ [[ $(( $(date +%s) - $(stat -c %Y "$CACHE_FILE" 2>/dev/null || echo 0) )) -gt $MAX_AGE ]]; then ~/workplace/ai-gateway/scripts/get-gateway-token.sh > "$CACHE_FILE" chmod 600 "$CACHE_FILE"fi
cat "$CACHE_FILE"Then in your shell profile:
export OPENAI_API_KEY=$(~/.local/bin/gateway-token-cached.sh)export GATEWAY_API_KEY=$(~/.local/bin/gateway-token-cached.sh)Next Steps
Section titled “Next Steps”- Agent Setup — Configure your AI agent to use the gateway
- API Reference — Endpoints, headers, and request formats