Skip to content

ADR-011: Bedrock Guardrails Integration

Status: Accepted Date: 2026-03-20 Deciders: AI Engineering NAMER

The AI Gateway currently has no content safety layer. Enterprise teams routing LLM traffic through the gateway need configurable content filtering to:

  • Block harmful content (hate speech, violence, sexual content, insults, misconduct)
  • Prevent PII leakage (SSNs, credit card numbers, phone numbers, email addresses)
  • Enforce topic restrictions (e.g., block discussions of competitor products or internal financials)
  • Filter specific words or phrases per organizational policy

Without a guardrail layer, individual teams must implement their own content moderation, leading to inconsistent enforcement and duplicated effort.

Integrate Amazon Bedrock Guardrails as a Terraform module (infrastructure/modules/guardrails/) that creates and versions a aws_bedrock_guardrail resource with configurable policies for content filtering, PII blocking, topic denial, and word filtering.

The module is gated by enable_guardrails (default false) so existing deployments are unaffected. ECS task IAM roles are extended with bedrock:ApplyGuardrail and bedrock:GetGuardrail permissions.

A Lambda function invoked before/after model calls to scan for PII and harmful content. Rejected because it requires building and maintaining custom ML models or regex-based detection, with no parity to Bedrock’s built-in classifiers.

Portkey Cloud offers guardrail hooks, but these require Portkey Cloud (SaaS) rather than the self-hosted OSS gateway we deploy. Not available in our architecture.

Services like Perspective API or OpenAI Moderation endpoint. Rejected because they add external dependencies, egress costs, and latency to a non-AWS service, conflicting with our AWS-native design principle.

Positive:

  • AWS-native solution requiring no additional infrastructure beyond the guardrail resource itself
  • Configurable per deployment environment (dev may use lower thresholds than prod)
  • Immutable versioning via aws_bedrock_guardrail_version for audit and rollback
  • PII detection uses Bedrock’s built-in classifiers (no custom model training)
  • Prompt attack detection included as a content filter category

Negative:

  • Adds latency per request when guardrails are applied (Bedrock API call for each evaluation)
  • Bedrock Guardrails pricing applies per text unit processed
  • Content filter categories and PII types are limited to what Bedrock supports
  • Guardrail must be explicitly applied by the calling application; it does not auto-intercept gateway traffic