Skip to content

The AI Gateway exposes two endpoints that mirror the native APIs of OpenAI and Anthropic. All requests require a valid JWT and a provider routing header.


EndpointFormatDescription
POST /v1/chat/completionsOpenAI Chat CompletionsStandard OpenAI-compatible chat completions
POST /v1/messagesAnthropic MessagesStandard Anthropic-compatible messages

Every request must include:

HeaderValueDescription
AuthorizationBearer <jwt>Cognito M2M JWT access token
x-portkey-provideranthropic, openai, google, or azure-openaiTells the gateway which upstream provider to route to

ValueUpstream ProviderTypical Models
anthropicAnthropicclaude-sonnet-4-20250514, claude-opus-4-20250514
openaiOpenAIgpt-4.1, gpt-4.1-mini, o3
googleGooglegemini-2.5-pro, gemini-2.5-flash
azure-openaiAzure OpenAIDeployment-specific model names

Terminal window
curl -X POST "${GATEWAY_URL}/v1/chat/completions" \
-H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-H "x-portkey-provider: openai" \
-d '{
"model": "gpt-4.1",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, world!"}
],
"max_tokens": 256
}'

Response:

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1711234567,
"model": "gpt-4.1",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 19,
"completion_tokens": 9,
"total_tokens": 28
}
}
Terminal window
curl -X POST "${GATEWAY_URL}/v1/messages" \
-H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-H "x-portkey-provider: anthropic" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Hello, world!"}
]
}'

Response:

{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! How can I help you today?"
}
],
"model": "claude-sonnet-4-20250514",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 10,
"output_tokens": 9
}
}

Routing Anthropic Models via OpenAI Format

Section titled “Routing Anthropic Models via OpenAI Format”

You can route requests to Anthropic models using the OpenAI Chat Completions format. The gateway translates the request on the fly:

Terminal window
curl -X POST "${GATEWAY_URL}/v1/chat/completions" \
-H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-H "x-portkey-provider: anthropic" \
-d '{
"model": "claude-sonnet-4-20250514",
"messages": [
{"role": "user", "content": "Hello from the OpenAI format!"}
],
"max_tokens": 256
}'

This is how agents like Continue.dev and LangChain access Anthropic models through the OpenAI-compatible endpoint.


The gateway enforces rate limiting at two layers:

RuleLimit
Per-IP rate limit2,000 requests per 5-minute window per IP address
AWS Managed RulesAWS Common Rule Set, IP reputation list

When WAF rate-limits a request, the gateway returns HTTP 403 with an x-amzn-waf-action response header.

Per-team rate limits are enforced via DynamoDB atomic counters (C.1). Each team’s tier defines two limits:

TierRPMDaily Tokens
sandbox20100,000
standard1001,000,000
premium50010,000,000
enterpriseunlimitedunlimited

When a team exceeds its limit, the response includes:

{
"allowed": false,
"reason": "RPM limit exceeded (101/100 requests per minute)",
"retry_after_seconds": 42
}

The admin API runs on a separate API Gateway with Cognito authorization (see ADR-014). Enable it with enable_admin_api = true.

All admin endpoints require a JWT with the admin scope. Obtain one via:

Terminal window
curl -X POST "${COGNITO_TOKEN_ENDPOINT}" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=client_credentials&client_id=${CLIENT_ID}&client_secret=${CLIENT_SECRET}&scope=https://gateway.internal/admin"
MethodPathDescription
GET/usage/{team}Current period usage, budget utilization, per-model breakdown
GET/usage/{team}/historyMonthly usage history
MethodPathDescription
GET/pricingList all pricing entries (DynamoDB overrides + static defaults)
GET/pricing/{provider}/{model}Get pricing for a specific model
PUT/pricing/{provider}/{model}Create or update a pricing override
DELETE/pricing/{provider}/{model}Remove override, revert to static default
MethodPathDescription
GET/teamsList registered teams
POST/teamsRegister a new team
GET/teams/{id}Get team details
PUT/teams/{id}Update team configuration
DELETE/teams/{id}Deregister a team
MethodPathDescription
GET/budgetsList all budgets
POST/budgetsCreate a budget
GET/budgets/{id}Get budget and current usage
PUT/budgets/{id}Update a budget
DELETE/budgets/{id}Delete a budget
MethodPathDescription
GET/routingList routing configurations
POST/routingCreate a routing rule
GET/routing/{id}Get routing rule details
PUT/routing/{id}Update a routing rule
DELETE/routing/{id}Delete a routing rule

The ALB health check endpoint is:

PathPortExpected Response
/8787HTTP 200

You can verify the gateway is reachable:

Terminal window
curl -s -o /dev/null -w "%{http_code}" "${GATEWAY_URL}/"
# Expected: 200