API Reference

The AI Gateway exposes two endpoints that mirror the native APIs of OpenAI and Anthropic. All requests require a valid JWT and a provider routing header.

Endpoints

Endpoint	Format	Description
`POST /v1/chat/completions`	OpenAI Chat Completions	Standard OpenAI-compatible chat completions
`POST /v1/messages`	Anthropic Messages	Standard Anthropic-compatible messages

Required Headers

Every request must include:

Header	Value	Description
`Authorization`	`Bearer <jwt>`	Cognito M2M JWT access token
`x-portkey-provider`	`anthropic`, `openai`, `google`, or `azure-openai`	Tells the gateway which upstream provider to route to

Provider Values

Value	Upstream Provider	Typical Models
`anthropic`	Anthropic	`claude-sonnet-4-20250514`, `claude-opus-4-20250514`
`openai`	OpenAI	`gpt-4.1`, `gpt-4.1-mini`, `o3`
`google`	Google	`gemini-2.5-pro`, `gemini-2.5-flash`
`azure-openai`	Azure OpenAI	Deployment-specific model names

Request Examples

OpenAI Chat Completions Format

curl -X POST "${GATEWAY_URL}/v1/chat/completions" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -H "x-portkey-provider: openai" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello, world!"}
    ],
    "max_tokens": 256
  }'

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1711234567,
  "model": "gpt-4.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 9,
    "total_tokens": 28
  }
}

Anthropic Messages Format

curl -X POST "${GATEWAY_URL}/v1/messages" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -H "x-portkey-provider: anthropic" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ]
  }'

Response:

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I help you today?"
    }
  ],
  "model": "claude-sonnet-4-20250514",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 10,
    "output_tokens": 9
  }
}

Routing Anthropic Models via OpenAI Format

You can route requests to Anthropic models using the OpenAI Chat Completions format. The gateway translates the request on the fly:

curl -X POST "${GATEWAY_URL}/v1/chat/completions" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -H "x-portkey-provider: anthropic" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "messages": [
      {"role": "user", "content": "Hello from the OpenAI format!"}
    ],
    "max_tokens": 256
  }'

This is how agents like Continue.dev and LangChain access Anthropic models through the OpenAI-compatible endpoint.

Rate Limiting

The gateway enforces rate limiting at two layers:

WAF Layer (IP-based)

Rule	Limit
Per-IP rate limit	2,000 requests per 5-minute window per IP address
AWS Managed Rules	AWS Common Rule Set, IP reputation list

When WAF rate-limits a request, the gateway returns HTTP 403 with an x-amzn-waf-action response header.

Team Layer (RPM + Daily Tokens)

Per-team rate limits are enforced via DynamoDB atomic counters (C.1). Each team’s tier defines two limits:

Tier	RPM	Daily Tokens
sandbox	20	100,000
standard	100	1,000,000
premium	500	10,000,000
enterprise	unlimited	unlimited

When a team exceeds its limit, the response includes:

{
  "allowed": false,
  "reason": "RPM limit exceeded (101/100 requests per minute)",
  "retry_after_seconds": 42
}

Admin API Endpoints

The admin API runs on a separate API Gateway with Cognito authorization (see ADR-014). Enable it with enable_admin_api = true.

All admin endpoints require a JWT with the admin scope. Obtain one via:

curl -X POST "${COGNITO_TOKEN_ENDPOINT}" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "grant_type=client_credentials&client_id=${CLIENT_ID}&client_secret=${CLIENT_SECRET}&scope=https://gateway.internal/admin"

Usage API

Method	Path	Description
`GET`	`/usage/{team}`	Current period usage, budget utilization, per-model breakdown
`GET`	`/usage/{team}/history`	Monthly usage history

Pricing Admin

Method	Path	Description
`GET`	`/pricing`	List all pricing entries (DynamoDB overrides + static defaults)
`GET`	`/pricing/{provider}/{model}`	Get pricing for a specific model
`PUT`	`/pricing/{provider}/{model}`	Create or update a pricing override
`DELETE`	`/pricing/{provider}/{model}`	Remove override, revert to static default

Teams

Method	Path	Description
`GET`	`/teams`	List registered teams
`POST`	`/teams`	Register a new team
`GET`	`/teams/{id}`	Get team details
`PUT`	`/teams/{id}`	Update team configuration
`DELETE`	`/teams/{id}`	Deregister a team

Budgets

Method	Path	Description
`GET`	`/budgets`	List all budgets
`POST`	`/budgets`	Create a budget
`GET`	`/budgets/{id}`	Get budget and current usage
`PUT`	`/budgets/{id}`	Update a budget
`DELETE`	`/budgets/{id}`	Delete a budget

Routing Config

Method	Path	Description
`GET`	`/routing`	List routing configurations
`POST`	`/routing`	Create a routing rule
`GET`	`/routing/{id}`	Get routing rule details
`PUT`	`/routing/{id}`	Update a routing rule
`DELETE`	`/routing/{id}`	Delete a routing rule

Health Check

The ALB health check endpoint is:

Path	Port	Expected Response
`/`	8787	HTTP 200

You can verify the gateway is reachable:

curl -s -o /dev/null -w "%{http_code}" "${GATEWAY_URL}/"
# Expected: 200