Monitoring and Observability

The AI Gateway exports telemetry through three channels: structured logs to CloudWatch Logs, distributed traces to X-Ray, and custom metrics via CloudWatch Embedded Metric Format (EMF). An ADOT sidecar container in each ECS task handles the export pipeline.

Telemetry Pipeline

flowchart LR
    subgraph task["ECS Task"]
        GW["Portkey Gateway\nPort 8787"]
        ADOT["ADOT Sidecar\nOTel Collector"]
    end

    subgraph cw["CloudWatch"]
        LOG1["/ecs/ai-gateway/gateway\nContainer Logs"]
        LOG2["/ecs/ai-gateway/otel\nCollector Logs"]
        OTEL_LOG["/ecs/ai-gateway/otel-logs\nOTLP Logs"]
        EMF["AIGateway Namespace\nEMF Metrics"]
        DASH["CloudWatch Dashboard\n4 Widgets"]
    end

    XR["AWS X-Ray\nDistributed Traces"]

    GW -->|"stdout/stderr\n(awslogs driver)"| LOG1
    ADOT -->|"stdout/stderr\n(awslogs driver)"| LOG2
    GW -->|"OTLP gRPC\nlocalhost:4317"| ADOT
    ADOT -->|"awsxray exporter"| XR
    ADOT -->|"awsemf exporter"| EMF
    ADOT -->|"awscloudwatchlogs\nexporter"| OTEL_LOG
    LOG1 --> DASH
    EMF --> DASH

CloudWatch Log Groups

Log Group	Source	Retention	Encryption
`/ecs/ai-gateway/gateway`	Portkey gateway container (pino JSON via Fastify)	365 days	KMS (`alias/ai-gateway-logs`)
`/ecs/ai-gateway/otel`	ADOT sidecar container operational logs	365 days	KMS (`alias/ai-gateway-logs`)
`/ecs/ai-gateway/otel-logs`	OTLP logs exported by the collector pipeline	365 days	KMS (`alias/ai-gateway-logs`)
`/ecs/ai-gateway/metrics`	EMF-formatted metrics from the collector	365 days	KMS (`alias/ai-gateway-logs`)
`aws-waf-logs-ai-gateway-{env}`	WAF request logs (only when WAF is enabled)	365 days	KMS (`alias/ai-gateway-logs`)

OpenTelemetry Collector Configuration

The ADOT sidecar runs the AWS Distro for OpenTelemetry (public.ecr.aws/aws-observability/aws-otel-collector:latest) with the following pipeline configuration (defined in infrastructure/otel-config.yaml):

Receivers

Receiver	Protocol	Endpoint
OTLP	gRPC	`localhost:4317`
OTLP	HTTP	`localhost:4318`

Processors

Processor	Configuration
`memory_limiter`	`check_interval: 1s`, `limit_mib: 100`
`batch`	`timeout: 5s`, `send_batch_size: 512`
`resource`	Upserts `service.name = ai-gateway`

Exporters and Pipelines

Pipeline	Processors	Exporter	Destination
Traces	memory_limiter, batch	`awsxray`	AWS X-Ray
Metrics	memory_limiter, batch	`awsemf`	CloudWatch Metrics (namespace: `AIGateway`, log group: `/ecs/ai-gateway/metrics`)
Logs	memory_limiter, batch	`awscloudwatchlogs`	CloudWatch Logs (`/ecs/ai-gateway/otel-logs`)

Saved CloudWatch Logs Insights Queries

Four pre-built queries are deployed as CloudWatch saved queries, targeting the gateway log group. All query the pino JSON logs emitted by the Portkey Fastify server.

1. Requests per Hour by Provider

Saved as: ai-gateway/requests-per-hour-by-provider

fields @timestamp, @message
| filter ispresent(responseTime)
| stats count(*) as requests by bin(1h), `req.headers.x-portkey-provider` as provider
| sort bin(1h) desc

2. Error Rate by Provider

Saved as: ai-gateway/error-rate-by-provider

fields @timestamp, @message
| filter ispresent(res.statusCode)
| stats count(*) as total,
        sum(res.statusCode >= 400) as errors,
        (sum(res.statusCode >= 400) / count(*)) * 100 as error_pct
  by `req.headers.x-portkey-provider` as provider
| sort error_pct desc

3. Latency Percentiles by Provider

Saved as: ai-gateway/latency-percentiles-by-provider

fields @timestamp, responseTime
| filter ispresent(responseTime)
| stats pct(responseTime, 50) as p50,
        pct(responseTime, 95) as p95,
        pct(responseTime, 99) as p99,
        avg(responseTime) as avg_ms
  by `req.headers.x-portkey-provider` as provider
| sort p99 desc

4. Requests by Endpoint

Saved as: ai-gateway/requests-by-endpoint

fields @timestamp, req.url
| filter ispresent(req.url)
| stats count(*) as requests by `req.url` as endpoint
| sort requests desc
| limit 20

CloudWatch Dashboard

The dashboard ai-gateway-{environment} contains 4 widgets arranged in a 2x2 grid:

Position	Widget	Type	Data Source
Top-left	Requests per Hour by Provider	Time series	Gateway log group (Logs Insights)
Top-right	Error Rate by Provider	Table	Gateway log group (Logs Insights)
Bottom-left	Latency Percentiles by Provider (ms)	Table	Gateway log group (Logs Insights)
Bottom-right	Top Endpoints by Request Count	Table	Gateway log group (Logs Insights)

Running Queries via CLI

The scripts/cw-queries.sh script provides a convenient way to run the saved queries from the command line:

# Run all 4 queries (default: last 1 hour)
./scripts/cw-queries.sh

# Run a specific query
./scripts/cw-queries.sh requests
./scripts/cw-queries.sh errors
./scripts/cw-queries.sh latency
./scripts/cw-queries.sh endpoints

Environment Variables

Variable	Default	Description
`LOG_GROUP`	`/ecs/ai-gateway/gateway`	Target CloudWatch log group
`START_TIME`	1 hour ago (epoch seconds)	Query start time
`END_TIME`	Now (epoch seconds)	Query end time

Examples

# Query the last 24 hours
START_TIME=$(date -d '24 hours ago' +%s) ./scripts/cw-queries.sh

# Query a different log group
LOG_GROUP=/ecs/ai-gateway/otel ./scripts/cw-queries.sh

# Query a specific time range
START_TIME=1711000000 END_TIME=1711003600 ./scripts/cw-queries.sh errors

Key Metrics to Watch

Operational Health

Metric	Source	Healthy Range	Action if Breached
Request rate	Logs Insights (requests/hour)	Baseline +/- 50%	Investigate traffic spikes; check if autoscaling is responding
Error rate (4xx + 5xx)	Logs Insights (error_pct)	< 5%	Check provider API status; review error logs for patterns
p50 latency	Logs Insights (latency_percentiles)	< 500ms	Normal range varies by model; investigate if suddenly increases
p99 latency	Logs Insights (latency_percentiles)	< 5000ms	May indicate provider throttling or network issues

Infrastructure Health

Metric	Source	Healthy Range	Action if Breached
ECS running task count	CloudWatch ECS metrics	>= `autoscaling_min_capacity`	Check ECS events for task failures; verify health checks
CPU utilization	CloudWatch ECS metrics	< 70% (autoscaling target)	Autoscaling should handle; increase `autoscaling_max_capacity` if at limit
ALB request count	CloudWatch ALB metrics	< 500/target (autoscaling target)	Autoscaling should handle; review if tasks are scaling appropriately
ALB 5xx count	CloudWatch ALB metrics	0	Check ECS task health; review gateway logs for crashes
WAF blocked requests	CloudWatch WAF metrics	Low, non-zero	Review WAF logs for false positives; adjust rules if legitimate traffic is blocked

Where to Look

Signal	First Check	Deep Dive
High error rate	Dashboard “Error Rate by Provider” widget	Logs Insights: filter by status code and provider
Slow responses	Dashboard “Latency Percentiles” widget	X-Ray traces: look for slow spans
No traffic	ALB target group health in ECS console	ECS task events and container health checks
Task crashes	ECS service events tab	Gateway log group: look for fatal/error level logs
WAF blocking	WAF metrics in CloudWatch	WAF log group: `aws-waf-logs-ai-gateway-{env}`