Model Pricing & Cost Attribution

Cost attribution turns gateway token usage into per-team cost. Prices resolve in two layers: a static fallback table baked into the cost-attribution Lambda, and a runtime DynamoDB overlay that overrides it. Custom agreements, volume discounts, and regional uplifts belong in the overlay — never in the code.

How a price resolves

The Lambda loads PRICING_TABLE (static defaults) on cold start.
If PRICING_TABLE_NAME is set, it scans that DynamoDB table and merges the entries on top of the static table (DynamoDB wins), cached for 5 minutes.
get_cost(provider, model, …) looks up (provider, model). A miss logs a WARNING and emits the UnknownModelPrice CloudWatch metric while still returning the _DEFAULT_PRICE estimate — so an unpriced model is visible and alarmable, never silently mis-billed.

So the static table is only a fallback. Anything you put in the overlay takes precedence within 5 minutes, with no redeploy.

Setting custom prices at runtime

Each pricing row is one DynamoDB item:

Attribute	Value
`PK`	`PRICE#{provider}#{model}` (e.g. `PRICE#bedrock#openai.gpt-5.5`)
`SK`	`CONFIG`
`provider`, `model`	the lookup key
`input_per_1k`, `output_per_1k`	USD per 1K tokens
`cache_read_per_1k`, `cache_write_per_1k`	optional; omit if the model has no cache discount
`cache_supported`	optional bool; set `false` for models with no cache-billing lane (e.g. gpt-oss) so savings report 0 instead of a default 10%-of-input estimate

Manage these through the pricing admin API (/pricing, behind the admin-plane Cognito authorizer) — GET/PUT/DELETE per provider/model. That is the supported path for an operator to record an Epic-specific negotiated rate or a volume tier without touching code.

Verified default rates (fallback)

The static OpenAI-on-Bedrock defaults were verified 2026-06-11 against the AWS Bedrock pricing page and the AWS Price List bulk API:

Model	Input /1K	Output /1K	Cache read /1K	Notes
`bedrock / openai.gpt-5.5`	$0.0055	$0.033	$0.00055	AWS rate (~10% over OpenAI 1st-party list)
`bedrock / openai.gpt-5.4`	$0.00275	$0.0165	$0.000275	AWS rate; GovCloud differs
`bedrock / openai.gpt-oss-120b`	$0.00015	$0.0006	—	no cache lane (`cache_supported=false`)
`bedrock / openai.gpt-oss-20b`	$0.00007	$0.0003	—	no cache lane

Anthropic Claude on Bedrock (verified 2026-06-11)

Standard (regional) rate on the base model ID; the global.-prefixed inference profile is ~10% cheaper and has its own row. Published cache-read is exactly 10% of input and cache-write (5-min) exactly 125%, so the defaults compute correct cache savings without per-row cache fields.

Model	Base ID	Input /1K	Output /1K
Opus 4.8	`anthropic.claude-opus-4-8`	$0.0055	$0.0275
Opus 4.7	`anthropic.claude-opus-4-7`	$0.0055	$0.0275
Opus 4.6	`anthropic.claude-opus-4-6-v1`	$0.0055	$0.0275
Opus 4.5	`anthropic.claude-opus-4-5-20251101-v1:0`	$0.0055	$0.0275
Opus 4.1 / 4	`anthropic.claude-opus-4-1-...` / `-4-...`	$0.015	$0.075
Sonnet 4.6	`anthropic.claude-sonnet-4-6`	$0.0033	$0.0165
Sonnet 4.5	`anthropic.claude-sonnet-4-5-20250929-v1:0`	$0.0033	$0.0165
Sonnet 4	`anthropic.claude-sonnet-4-20250514-v1:0`	$0.003	$0.015
Haiku 4.5	`anthropic.claude-haiku-4-5-20251001-v1:0`	$0.0011	$0.0055
Fable 5	`anthropic.claude-fable-5`	$0.011	$0.055

Auto-refresh (future)

AWS exposes per-model token pricing programmatically via the Price List bulk API (offer codes AmazonBedrock and AmazonBedrockFoundationModels) and the pricing:GetProducts query API. A scheduled job can refresh the overlay from these rather than hand-editing — but it must tolerate a missing row (the bulk API lags new-model GA) by keeping the last-known rate and alerting, not failing.