Skip to content

Model Pricing & Cost Attribution

Cost attribution turns gateway token usage into per-team cost. Prices resolve in two layers: a static fallback table baked into the cost-attribution Lambda, and a runtime DynamoDB overlay that overrides it. Custom agreements, volume discounts, and regional uplifts belong in the overlay — never in the code.

  1. The Lambda loads PRICING_TABLE (static defaults) on cold start.
  2. If PRICING_TABLE_NAME is set, it scans that DynamoDB table and merges the entries on top of the static table (DynamoDB wins), cached for 5 minutes.
  3. get_cost(provider, model, …) looks up (provider, model). A miss logs a WARNING and emits the UnknownModelPrice CloudWatch metric while still returning the _DEFAULT_PRICE estimate — so an unpriced model is visible and alarmable, never silently mis-billed.

So the static table is only a fallback. Anything you put in the overlay takes precedence within 5 minutes, with no redeploy.

Each pricing row is one DynamoDB item:

AttributeValue
PKPRICE#{provider}#{model} (e.g. PRICE#bedrock#openai.gpt-5.5)
SKCONFIG
provider, modelthe lookup key
input_per_1k, output_per_1kUSD per 1K tokens
cache_read_per_1k, cache_write_per_1koptional; omit if the model has no cache discount
cache_supportedoptional bool; set false for models with no cache-billing lane (e.g. gpt-oss) so savings report 0 instead of a default 10%-of-input estimate

Manage these through the pricing admin API (/pricing, behind the admin-plane Cognito authorizer) — GET/PUT/DELETE per provider/model. That is the supported path for an operator to record an Epic-specific negotiated rate or a volume tier without touching code.

The static OpenAI-on-Bedrock defaults were verified 2026-06-11 against the AWS Bedrock pricing page and the AWS Price List bulk API:

ModelInput /1KOutput /1KCache read /1KNotes
bedrock / openai.gpt-5.5$0.0055$0.033$0.00055AWS rate (~10% over OpenAI 1st-party list)
bedrock / openai.gpt-5.4$0.00275$0.0165$0.000275AWS rate; GovCloud differs
bedrock / openai.gpt-oss-120b$0.00015$0.0006no cache lane (cache_supported=false)
bedrock / openai.gpt-oss-20b$0.00007$0.0003no cache lane

Anthropic Claude on Bedrock (verified 2026-06-11)

Section titled “Anthropic Claude on Bedrock (verified 2026-06-11)”

Standard (regional) rate on the base model ID; the global.-prefixed inference profile is ~10% cheaper and has its own row. Published cache-read is exactly 10% of input and cache-write (5-min) exactly 125%, so the defaults compute correct cache savings without per-row cache fields.

ModelBase IDInput /1KOutput /1K
Opus 4.8anthropic.claude-opus-4-8$0.0055$0.0275
Opus 4.7anthropic.claude-opus-4-7$0.0055$0.0275
Opus 4.6anthropic.claude-opus-4-6-v1$0.0055$0.0275
Opus 4.5anthropic.claude-opus-4-5-20251101-v1:0$0.0055$0.0275
Opus 4.1 / 4anthropic.claude-opus-4-1-... / -4-...$0.015$0.075
Sonnet 4.6anthropic.claude-sonnet-4-6$0.0033$0.0165
Sonnet 4.5anthropic.claude-sonnet-4-5-20250929-v1:0$0.0033$0.0165
Sonnet 4anthropic.claude-sonnet-4-20250514-v1:0$0.003$0.015
Haiku 4.5anthropic.claude-haiku-4-5-20251001-v1:0$0.0011$0.0055
Fable 5anthropic.claude-fable-5$0.011$0.055

AWS exposes per-model token pricing programmatically via the Price List bulk API (offer codes AmazonBedrock and AmazonBedrockFoundationModels) and the pricing:GetProducts query API. A scheduled job can refresh the overlay from these rather than hand-editing — but it must tolerate a missing row (the bulk API lags new-model GA) by keeping the last-known rate and alerting, not failing.