Skip to content

Infrastructure Domain -- Tech Stack Research Report

Project: LLM API Gateway (Portkey OSS) Date: 2026-03-18 Researcher Domain: Infrastructure Locked-In: Python 3.13, uv, ECS Fargate, ALB, Terraform


  1. Containerization — Default Confirmed
  2. Orchestration — Skipped (Locked In)
  3. Infrastructure as Code — Skipped (Locked In) + Terraform Module Research
  4. Docker Image Hardening — Full Research
  5. CI/CD — Default Confirmed
  6. Observability — Default Confirmed + OTEL Collector Research
  7. Secret Management — Full Research
  8. Container Security Pipeline — Default Tools + Wiring Research
  9. Dev Tool Management — Default Confirmed
  10. Service Mesh — Skipped
  11. Domain-Specific Artifacts
  12. Compatibility Notes
  13. Sources

  • Version: Docker Engine 27.x / BuildKit 0.17.0 (Q1 2026)
  • Why: Docker with BuildKit is the universal container build standard. BuildKit is default since Docker 23.0+ and provides parallel stage execution, cache mounts, secret mounts, and SSH forwarding. For a Python/uv multi-stage build, BuildKit’s cache mount feature (--mount=type=cache) dramatically speeds up dependency installation.
  • Health: HEALTHY
  • Version: BuildKit 0.17.0 (Q1 2026)
  • Activity: Actively developed, part of Docker Engine and Moby project
  • Maintainers: Docker Inc. + large open-source community
  • Stars: 8k+ (moby/buildkit) | License: Apache 2.0
  • Notes: Default builder in Docker Desktop and Engine 23.0+. The 0.17.0 release includes optimized backend for performance and memory usage, better isolation, and rootless execution support.

ECS Fargate is locked in. No comparison needed. Kubernetes is on the avoid list.

Compatibility note: ECS Fargate integrates natively with ALB (also locked in) via target group IP mode. Terraform modules cover this pattern well (see Section 3).


3. Infrastructure as Code — Skipped (Locked In) + Terraform Module Research

Section titled “3. Infrastructure as Code — Skipped (Locked In) + Terraform Module Research”

Terraform is locked in. No IaC comparison needed.

License note: Terraform switched from MPL 2.0 to BSL 1.1 (Business Source License) in August 2023. This restricts commercial competitors of HashiCorp from using it. For internal infrastructure use (which is this project’s case), there is no licensing impact. OpenTofu exists as an MPL-licensed fork if needed in the future. [1]

Terraform Module Research — Full Research

Section titled “Terraform Module Research — Full Research”

For this ECS Fargate + ALB deployment, three community modules from terraform-aws-modules form the foundation.

terraform-aws-modules/terraform-aws-ecs — HEALTHY

Section titled “terraform-aws-modules/terraform-aws-ecs — HEALTHY”
  • Version: 5.12.1 (released 2025-04-18)
  • Activity: Frequent releases, bug fixes and features through 2025
  • Maintainers: Anton Babenko + terraform-aws-modules org (community-maintained, AWS-endorsed)
  • Stars: 1,700+ | License: Apache 2.0
  • Structure: Root module + 3 sub-modules:
    • cluster — ECS cluster + capacity providers + CloudWatch log groups
    • service — ECS service + task definition + IAM roles + autoscaling + load balancer integration
    • container-definition — Container properties, port mappings, env vars, logging
  • Key features: Fargate-first design, built-in autoscaling, service connect, service discovery, ALB integration, CloudWatch logging, IAM role management
  • Notes: The service sub-module defaults to FARGATE launch type and creates autoscaling by default. The container-definition sub-module handles OTEL sidecar containers cleanly.

terraform-aws-modules/terraform-aws-alb — HEALTHY

Section titled “terraform-aws-modules/terraform-aws-alb — HEALTHY”
  • Version: 9.16.0 (released 2025-04-21)
  • Activity: Very active, multiple releases in 2025
  • Maintainers: Anton Babenko + terraform-aws-modules org
  • Stars: 900+ | License: Apache 2.0
  • Key features: ALB + NLB support, target groups with create_attachment = false for ECS integration, HTTPS listeners, mutual TLS, connection logging, health checks, blue-green deployment support
  • ECS Integration: Create target groups without attachments; ECS registers task IPs automatically. This is the documented pattern in docs/patterns.md.
  • Notes: Supports zonal shift, anomaly mitigation, trust stores, and HTTP response headers.

terraform-aws-modules/terraform-aws-vpc — HEALTHY

Section titled “terraform-aws-modules/terraform-aws-vpc — HEALTHY”
  • Version: 5.21.0 (released 2025-04-21)
  • Activity: Very active, frequent releases
  • Maintainers: Anton Babenko + terraform-aws-modules org
  • Stars: 3,000+ | License: Apache 2.0
  • Key features: Public/private subnets, NAT gateways, VPC endpoints (including ECS endpoints for private Fargate tasks), flow logs, multi-AZ
  • Notes: For Fargate, use private subnets with NAT gateway for outbound internet access. VPC endpoints for ECR, S3, and CloudWatch Logs reduce NAT costs and improve latency.
infra/
modules/ # Local wrappers if needed
environments/
dev/
main.tf # Uses terraform-aws-modules/*
variables.tf
outputs.tf
terraform.tfvars
prod/
main.tf
variables.tf
outputs.tf
terraform.tfvars
shared/
ecr.tf # ECR repository (shared across envs)
state.tf # S3 backend + DynamoDB lock table

Module pinning: Always pin modules to exact versions:

module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.21.0"
}
module "alb" {
source = "terraform-aws-modules/alb/aws"
version = "9.16.0"
}
module "ecs" {
source = "terraform-aws-modules/ecs/aws"
version = "5.12.1"
}

4. Docker Image Hardening — Full Research

Section titled “4. Docker Image Hardening — Full Research”

Recommendation: python:3.13-slim with multi-stage hardening (primary), with Chainguard upgrade path

Section titled “Recommendation: python:3.13-slim with multi-stage hardening (primary), with Chainguard upgrade path”
Criteria (weight)Chainguard python:latestGoogle distroless/python3python:3.13-slim + hardening
Security posture (0.25)10/10 (zero CVEs)8/10 (low CVEs)6/10 (some CVEs, mitigated)
Python 3.13 support (0.20)7/10 (paid for 3.13 tag)3/10 (no 3.13)10/10 (native)
uv compatibility (0.15)8/10 (works with multi-stage)5/10 (complex setup)10/10 (native)
Image size (0.10)10/10 (~23 MB)8/10 (~50 MB)7/10 (~150 MB)
Debuggability (0.10)3/10 (no shell)2/10 (no shell)9/10 (has shell)
Free/no vendor lock (0.10)5/10 (free=latest only)10/10 (free)10/10 (free)
Team familiarity (0.10)5/104/1010/10
Weighted Score7.055.258.30

Chainguard (cgr.dev/chainguard/python): Best security posture with zero CVEs, SBOM, and Sigstore signatures built in. Nightly automated rebuilds. However, the free tier only provides :latest and :latest-dev tags, which currently point to Python 3.14. The Python 3.13 tag (3.13, 3.13.12) requires a paid subscription (“Contact us for access”). Image is ~23 MB compressed. [2][3]

Google distroless (gcr.io/distroless/python3): No Python 3.13 support — the image is significantly behind on Python versions. Not actively maintained for Python specifically. Lacks the nightly rebuild cadence of Chainguard. Effectively deprecated for Python use cases. [4]

python:3.13-slim with hardening: The official Python slim image is ~150 MB but provides native Python 3.13 support, full uv compatibility, and easy debugging. With multi-stage builds, non-root user, and readonly filesystem, the security posture improves dramatically. The remaining CVEs are in OS packages that typically do not affect Python application code. [5]

For a small team (2-5) that needs Python 3.13 specifically and uses uv, python:3.13-slim with a hardened multi-stage build is the pragmatic choice. It provides the best developer experience, zero friction with uv, and well-understood tooling. The security gap is closed through:

  • Multi-stage builds (no build tools in production image)
  • Non-root user
  • Read-only filesystem where possible
  • Trivy/Grype scanning in CI to catch and triage CVEs

Upgrade path: When the team is ready to invest in Chainguard’s paid tier for pinned Python 3.13 tags, the migration is straightforward — change the FROM line in the runtime stage. The multi-stage build pattern works identically with both.

  • Chainguard Python versions page: Python 3.13 tag requires paid access [3]
  • Chainguard zero-CVE benchmarks show equivalent performance to upstream [6]
  • Hynek Schlawack’s production-ready uv Docker guide is the definitive reference for multi-stage uv builds [7]

  • Version: Current (actions/runner 2.x, ubuntu-latest)
  • Why: Default for GitHub-hosted repos. Excellent ecosystem of security scanning actions (trivy-action, hadolint-action, cosign). Native OIDC for keyless signing with Sigstore. Free tier sufficient for small teams.
  • Health: HEALTHY
  • Version: N/A (managed service, continuously updated)
  • Activity: GitHub Ships updates continuously
  • Maintainers: GitHub (Microsoft) — large dedicated team
  • Stars: N/A (managed service) | License: Proprietary (free tier for public repos, included with GitHub plans)
  • Notes: Native OIDC token support is critical for Cosign keyless signing in the container security pipeline. id-token: write permission enables this.

6. Observability — Default Confirmed + OTEL Collector Research

Section titled “6. Observability — Default Confirmed + OTEL Collector Research”

Recommendation: OpenTelemetry Python SDK + AWS Distro for OpenTelemetry (ADOT) Collector sidecar

Section titled “Recommendation: OpenTelemetry Python SDK + AWS Distro for OpenTelemetry (ADOT) Collector sidecar”
  • Version: 1.39.0 (released 2025-12-03), with SDK release on PyPI dated 2026-03-04
  • Activity: Very active — weekly community meetings, frequent releases
  • Maintainers: 3 maintainers (Aaron Abbott, Leighton Chen, Riccardo Magliocchetti) + 10 approvers
  • Stars: 1,900+ | License: Apache 2.0
  • Stability: Traces and Metrics are stable. Logs are under active development with breaking changes.
  • Notes: Supports Python 3.9-3.14. The SDK provides auto-instrumentation for common frameworks. [8]

OTEL Collector on ECS Fargate — Sidecar Pattern

Section titled “OTEL Collector on ECS Fargate — Sidecar Pattern”

ECS Fargate has no DaemonSet concept. The two deployment patterns are:

  1. Sidecar pattern (recommended): OTEL Collector runs as a second container in the same task definition. Application sends telemetry to localhost:4317 (gRPC) or localhost:4318 (HTTP). [9]
  2. Standalone service pattern: Collector runs as a separate ECS service. Application sends telemetry over the network. More complex, but allows centralized config.

Recommendation: Use the sidecar pattern with ADOT Collector (public.ecr.aws/aws-observability/aws-otel-collector).

Why ADOT over upstream OTEL Collector: ADOT is AWS’s production-tested, supported distribution. It comes pre-configured with AWS-specific exporters (X-Ray, CloudWatch, AMP). It is tested and validated against ECS. AWS provides official ECS task definition examples. [10]

Resource allocation: 128 MB RAM and 0.25 vCPU is sufficient for the sidecar collector in most cases. [9]

Collector config sends to:

  • AWS X-Ray — distributed traces (native ECS integration)
  • Amazon CloudWatch — metrics and logs
  • Optionally: Amazon Managed Service for Prometheus for Grafana-based dashboards

Recommendation: AWS Secrets Manager (for credentials) + SSM Parameter Store (for config)

Section titled “Recommendation: AWS Secrets Manager (for credentials) + SSM Parameter Store (for config)”

Use both services for their respective strengths rather than picking one.

Criteria (weight)AWS Secrets ManagerSSM Parameter StoreHashiCorp Vault
ECS integration (0.25)10/10 (native)10/10 (native)5/10 (sidecar needed)
Auto-rotation (0.20)10/10 (built-in RDS)3/10 (custom Lambda)8/10 (dynamic secrets)
Cost efficiency (0.15)5/10 ($0.40/secret/mo)9/10 ($0.05/10K calls)3/10 (self-managed)
Simplicity (0.15)9/1010/103/10 (complex)
Cross-account (0.10)9/10 (native)7/10 (supported)8/10
Versioning (0.10)9/10 (staging labels)8/10 (version history)7/10
Avoid “heavy infra” (0.05)10/10 (managed)10/10 (managed)2/10 (self-managed)
Weighted Score8.457.804.75

AWS Secrets Manager: Best for credentials that need rotation (database passwords, API keys to external LLM providers). Native ECS integration via secrets in container definitions — ECS Task Execution Role fetches secrets at task startup. Automatic rotation built in for RDS, Redshift, DocumentDB. Cross-region replication for DR. Cost is $0.40/secret/month + $0.05/10K API calls. [11][12]

SSM Parameter Store: Best for application configuration (feature flags, endpoint URLs, non-sensitive config). Free tier for standard parameters (up to 10,000). SecureString parameters use KMS encryption. Hierarchical organization (/prod/ai-gateway/model-config). Cost is $0.05/10K API calls for advanced parameters. [11][12]

HashiCorp Vault: Eliminated — violates the “avoid heavy/complex infra” constraint. Self-managed, requires its own cluster, adds operational burden for a small team.

Secret TypeServiceExample
LLM provider API keysSecrets ManagerOpenAI key, Anthropic key
Database credentialsSecrets ManagerAurora credentials (auto-rotated)
Internal service tokensSecrets ManagerInter-service auth tokens
Feature flagsParameter Store/prod/ai-gateway/enable-caching
Model routing configParameter Store/prod/ai-gateway/default-model
Non-sensitive endpoint URLsParameter Store/prod/ai-gateway/upstream-url

Both services inject natively into ECS containers:

{
"containerDefinitions": [{
"secrets": [
{
"name": "OPENAI_API_KEY",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:openai-key"
},
{
"name": "MODEL_CONFIG",
"valueFrom": "arn:aws:ssm:us-east-1:123456789:parameter/prod/ai-gateway/model-config"
}
]
}]
}

Secrets are fetched by the ECS agent at task startup using the Task Execution Role. They are NOT baked into the image or task definition. Secret rotation requires a new task deployment (ECS rolling update). [13]


8. Container Security Pipeline — Default Tools + Wiring Research

Section titled “8. Container Security Pipeline — Default Tools + Wiring Research”

All six tools from the defaults table are confirmed healthy and recommended. Here is how they wire together in the correct order.

Phase 1: PRE-BUILD (on every PR)
1. hadolint -- Lint Dockerfile for best practices
2. checkov -- Scan Terraform/IaC for misconfigurations
Phase 2: POST-BUILD (after docker build)
3. trivy image -- Scan image for vulns + misconfigs + secrets
4. grype -- Secondary scan with EPSS/KEV risk scoring
5. syft -- Generate SBOM (CycloneDX + SPDX)
Phase 3: POST-SCAN (before push to registry)
6. cosign -- Keyless sign the image with Sigstore OIDC
  • hadolint first: Catches Dockerfile issues before building. Fast feedback. No image needed.
  • checkov alongside hadolint: Scans IaC files in the PR. Also scans Dockerfiles for additional checks.
  • trivy before grype: Trivy is the broadest scanner (vulns + misconfig + secrets + licenses). Grype provides complementary risk scoring with EPSS (Exploit Prediction Scoring System) and KEV (Known Exploited Vulnerabilities catalog) data that trivy does not emphasize.
  • syft after scanning: Generate the SBOM from the scanned image. The SBOM documents what was scanned and shipped.
  • cosign last: Sign only after all gates pass. The signature attests “this image passed all security checks in this pipeline.” Uses keyless OIDC signing tied to the GitHub Actions workflow identity.
  • Version: 3.0.0 (released ~2025)
  • Activity: Less frequent releases, but stable
  • Maintainers: 1 primary (Lukas Martinelli)
  • Stars: 10k+ | License: GPL-3.0
  • Issues: Single maintainer (bus factor of 1). However, the tool is mature and stable — Dockerfile best practices do not change rapidly.
  • Notes: GPL-3.0 license applies to the tool binary only, not to your Dockerfiles. No licensing concern for CI usage.
  • Version: 0.69.2 (March 2026, post-incident hotfix)
  • Activity: Very active — Aqua Security funded
  • Maintainers: Aqua Security team (corporate-backed)
  • Stars: 31k+ | License: Apache 2.0
  • CRITICAL NOTE: In late February / early March 2026, Trivy experienced a supply chain attack (CVE-2026-28353). An AI-powered bot exploited a pull_request_target misconfiguration in GitHub Actions, stole a PAT, deleted 178 releases, and pushed a malicious VS Code extension. The core Trivy CLI binary was NOT compromised, but release assets from the affected window (Feb 27 - Mar 1) cannot be verified. Aqua republished v0.69.2 with clean assets. [14][15]
  • Mitigation: Always verify Trivy installation via cosign signatures. Use the official aquasecurity/trivy-action GitHub Action rather than get.trivy.dev install script. Pin to specific versions in CI.
  • Version: 3.2.382 (released 2026-03-06)
  • Activity: Extremely active — multiple releases per week
  • Maintainers: Prisma Cloud (Palo Alto Networks) — corporate-backed
  • Stars: 7k+ | License: Apache 2.0
  • Notes: 1000+ built-in policies. Graph-based cross-resource analysis for Terraform. Also scans Dockerfiles (CKV_DOCKER checks).
  • Version: Latest release available on GitHub (active releases through 2026)
  • Activity: Active, automated release pipeline
  • Maintainers: Anchore Inc. — corporate-backed
  • Stars: 9k+ | License: Apache 2.0
  • Notes: Uses Syft internally for SBOM generation. Provides composite CVSS + EPSS + KEV risk scoring that trivy lacks.
  • Version: 1.42.0 (2026)
  • Activity: Active development by Anchore
  • Maintainers: Anchore Inc. — corporate-backed, 219+ contributors
  • Stars: 8.4k+ | License: Apache 2.0
  • Notes: Supports CycloneDX 1.6 and SPDX 2.3 output. Pairs with Grype for scan-from-SBOM workflow. Supports signed SBOM attestations via in-toto.
  • Version: 2.4.3
  • Activity: Active development
  • Maintainers: Sigstore project (Linux Foundation), 7+ regular contributors
  • Stars: 4k+ | License: Apache 2.0
  • Notes: Keyless signing via Fulcio CA + Rekor transparency log. GitHub Actions OIDC integration is first-class. Future development moving to sigstore-go for a major version. v2.x remains stable and supported.
Capabilityhadolinttrivygrypecheckovsyftcosign
Dockerfile lintingYESpartial
Image vuln scanningYESYES
Image misconfig scanningYES
Image secret detectionYES
IaC scanningYESYES
SBOM generationYESYES
EPSS/KEV risk scoringYES
Image signingYES
License complianceYESYES

Why both trivy AND grype? Trivy is the broadest single tool, but Grype’s EPSS/KEV scoring provides better prioritization of which vulns actually matter. Running both catches edge cases where one scanner’s vulnerability database is ahead of the other. The overhead is minimal in CI.


9. Dev Tool Management — Default Confirmed

Section titled “9. Dev Tool Management — Default Confirmed”
  • Version: 2026.3.9 (released 2026-03-13)
  • Why: Default for the team. Manages Python versions, tool versions, environment variables, and project tasks. Pairs with uv for a complete Python dev environment.
  • Health: HEALTHY
  • Version: 2026.3.9 (released 2026-03-13)
  • Activity: Extremely active — multiple releases per month (CalVer)
  • Maintainers: jdx (Jeff Dickey) + community contributors
  • Stars: 12k+ | License: MIT
  • Notes: CalVer versioning reflects release dates. Very healthy contribution pattern with new contributors each release.

Single service / monolith (Portkey OSS proxy). No microservices architecture. Service mesh is not applicable per the domain conditional logic.


# =============================================================================
# Stage 1: Builder -- installs dependencies with full build tools
# =============================================================================
FROM python:3.13-slim AS builder
# Install uv from the official image
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
# Configure uv for container builds
ENV UV_LINK_MODE=copy \
UV_COMPILE_BYTECODE=1 \
UV_PYTHON_DOWNLOADS=never
WORKDIR /app
# Install dependencies first (cached layer)
COPY pyproject.toml uv.lock ./
RUN uv sync --no-dev --locked --no-editable --no-install-project
# Copy source code and install the project
COPY src/ ./src/
COPY README.md ./
RUN uv sync --no-dev --locked --no-editable
# =============================================================================
# Stage 2: Runtime -- minimal production image
# =============================================================================
FROM python:3.13-slim AS runtime
# Security: install only runtime OS deps, clean up
RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates \
tini \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get purge -y --auto-remove
# Security: create non-root user
RUN groupadd --gid 1001 appuser && \
useradd --uid 1001 --gid 1001 --shell /usr/sbin/nologin --create-home appuser
WORKDIR /app
# Copy only the virtual environment from builder
COPY --from=builder --chown=appuser:appuser /app/.venv /app/.venv
# Copy application source
COPY --from=builder --chown=appuser:appuser /app/src /app/src
# Set PATH to use the venv
ENV PATH="/app/.venv/bin:$PATH" \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1
# Security: switch to non-root user
USER appuser
# Health check for ECS
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8787/health')" || exit 1
# Use tini as init process (reaps zombies, forwards signals)
ENTRYPOINT ["tini", "--"]
# Portkey gateway default port
EXPOSE 8787
CMD ["python", "-m", "portkey_gateway"]
[tools]
python = "3.13"
[env]
_.python.venv = { path = ".venv", create = true }
[tasks]
install = "uv sync"
dev = "uv run python -m portkey_gateway"
test = "uv run pytest tests/"
lint = "uvx ruff check ."
format = "uvx ruff format ."
typecheck = "uvx pyright"
# Container tasks
build = "docker build -t ai-gateway:dev ."
scan = "trivy image --severity HIGH,CRITICAL ai-gateway:dev"
lint-docker = "hadolint Dockerfile"
sbom = "syft ai-gateway:dev -o cyclonedx-json > sbom.json"
# IaC tasks
tf-plan = { run = "terraform plan", dir = "infra/environments/dev" }
tf-apply = { run = "terraform apply", dir = "infra/environments/dev" }
tf-lint = "checkov -d infra/ --framework terraform"
.github/workflows/ci.yml
name: CI/CD Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com
IMAGE_NAME: ai-gateway
AWS_REGION: us-east-1
permissions:
contents: read
id-token: write # Required for cosign keyless signing
security-events: write # Required for SARIF upload
jobs:
# ===== Phase 1: Pre-Build Checks =====
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Lint Dockerfile
uses: hadolint/hadolint-action@v3.1.0
with:
dockerfile: Dockerfile
failure-threshold: warning
- name: Scan IaC with Checkov
uses: bridgecrewio/checkov-action@v12
with:
directory: infra/
framework: terraform
soft_fail: false
# ===== Phase 2: Build + Security Scan =====
build-and-scan:
runs-on: ubuntu-latest
needs: lint
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Build image
run: |
docker build \
-t ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
-t ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest \
.
- name: Trivy vulnerability scan
uses: aquasecurity/trivy-action@0.28.0
with:
image-ref: "${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}"
format: sarif
output: trivy-results.sarif
severity: CRITICAL,HIGH
exit-code: 1
- name: Upload Trivy SARIF
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: trivy-results.sarif
- name: Grype vulnerability scan
uses: anchore/scan-action@v4
with:
image: "${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}"
fail-build: true
severity-cutoff: high
- name: Generate SBOM with Syft
uses: anchore/sbom-action@v0
with:
image: "${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}"
format: cyclonedx-json
output-file: sbom.cyclonedx.json
- name: Upload SBOM artifact
uses: actions/upload-artifact@v4
with:
name: sbom
path: sbom.cyclonedx.json
# ===== Phase 3: Sign + Push =====
- name: Push image to ECR
if: github.ref == 'refs/heads/main'
run: |
docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
- name: Sign image with Cosign
if: github.ref == 'refs/heads/main'
uses: sigstore/cosign-installer@v3
- run: |
cosign sign --yes \
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
# ===== Phase 4: Deploy =====
deploy:
runs-on: ubuntu-latest
needs: build-and-scan
if: github.ref == 'refs/heads/main'
environment: production
steps:
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
- name: Deploy to ECS
run: |
aws ecs update-service \
--cluster ai-gateway \
--service ai-gateway \
--force-new-deployment

11.4 OTEL Collector Sidecar — Terraform Snippet

Section titled “11.4 OTEL Collector Sidecar — Terraform Snippet”
# Within the ECS service module container_definitions
module "ecs_service" {
source = "terraform-aws-modules/ecs/aws//modules/service"
version = "5.12.1"
name = "ai-gateway"
cluster_arn = module.ecs_cluster.arn
container_definitions = {
# Main application container
ai-gateway = {
essential = true
image = "${aws_ecr_repository.ai_gateway.repository_url}:latest"
port_mappings = [{
containerPort = 8787
protocol = "tcp"
}]
environment = [
{ name = "OTEL_EXPORTER_OTLP_ENDPOINT", value = "http://localhost:4317" },
{ name = "OTEL_SERVICE_NAME", value = "ai-gateway" },
]
secrets = [
{ name = "OPENAI_API_KEY", valueFrom = aws_secretsmanager_secret.openai_key.arn },
]
}
# OTEL Collector sidecar
otel-collector = {
essential = true
image = "public.ecr.aws/aws-observability/aws-otel-collector:latest"
cpu = 256 # 0.25 vCPU
memory = 128 # 128 MB
environment = [
{ name = "AOT_CONFIG_CONTENT", value = file("${path.module}/otel-collector-config.yaml") },
]
}
}
# Task-level resources
cpu = 1024 # 1 vCPU total
memory = 2048 # 2 GB total
# ... ALB target group, IAM, etc.
}

  1. Terraform BSL 1.1: No impact for internal use. If the project is ever open-sourced as a reusable infrastructure module, consider OpenTofu as an alternative. The terraform-aws-modules are Apache 2.0 licensed and work with both Terraform and OpenTofu.

  2. Python 3.13 + uv + Docker: The multi-stage Dockerfile uses python:3.13-slim as both builder and runtime base. uv is copied from ghcr.io/astral-sh/uv:latest into the builder stage only. The runtime stage contains only the .venv with compiled bytecode.

  3. ADOT Collector + Portkey OSS: Portkey gateway needs to be instrumented with OpenTelemetry Python SDK. The auto-instrumentation agent (opentelemetry-instrument) can wrap the Portkey process. OTLP exporter sends to localhost:4317 where the ADOT sidecar listens.

  4. Secrets Manager + ECS Fargate: Secrets are injected as environment variables at task startup. If an LLM provider key rotates, a new ECS deployment is needed (or use the S3 sidecar pattern for hot-reload). For most use cases, the environment variable injection is sufficient.

  5. Trivy post-incident (CVE-2026-28353): Pin trivy-action to a specific version (@0.28.0 or later) rather than @master. Verify the action’s provenance. The CLI tool itself was not compromised — the attack vector was the VS Code extension and release assets.

  6. hadolint GPL-3.0: This license applies to the hadolint binary. Running it as a CI tool does not impose GPL obligations on your codebase. It lints Dockerfiles; it does not link with your application.


  1. Spacelift — “Terraform License Change (BSL) — Impact on Users and Providers” https://spacelift.io/blog/terraform-license-change
  2. Chainguard — “Best Python Docker image: Top options compared” https://www.chainguard.dev/supply-chain-security-101/best-python-docker-image-top-options-compared
  3. Chainguard — Python Container Image Versions https://images.chainguard.dev/directory/image/python/versions
  4. OneUptime — “How to Build Minimal Container Images with Distroless and Chainguard” https://oneuptime.com/blog/post/2026-02-09-distroless-chainguard-minimal-images/view
  5. Hynek Schlawack — “Production-ready Python Docker Containers with uv” https://hynek.me/articles/docker-uv/
  6. Chainguard — “Zero CVEs and just as fast: Python and Go Images” https://chainguard.dev/unchained/zero-cves-and-just-as-fast-chainguards-python-go-images
  7. Digon.IO — “Build Multistage Python Docker Images Using UV” https://digon.io/en/blog/2025_07_28_python_docker_images_with_uv
  8. OpenTelemetry Python SDK — PyPI https://pypi.org/project/opentelemetry-sdk/
  9. OneUptime — “How to Configure OpenTelemetry for AWS ECS with Sidecar Collector” https://oneuptime.com/blog/post/2026-02-06-opentelemetry-aws-ecs-sidecar-collector/view
  10. AWS — “Deployment patterns for the ADOT Collector with Amazon ECS” https://aws.amazon.com/blogs/opensource/deployment-patterns-for-the-aws-distro-for-opentelemetry-collector-with-amazon-elastic-container-service/
  11. Cloud Kiln — “Managing Secrets in ECS: Parameter Store vs. Secrets Manager” https://cloudkiln.com/blog/ecs-secrets-management
  12. Doppler — “AWS Secrets Manager vs. Parameter Store” https://www.doppler.com/guides/aws-guides/aws-secrets-manager-vs-parameter-store
  13. AWS Docs — “Best practices for secrets management in Amazon ECS” https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security-secrets-management.html
  14. Reddit r/devops — “CVE-2026-28353 the Trivy security incident” https://www.reddit.com/r/devops/comments/1rqmrhi/ve202628353_the_trivy_security_incident_nobody_is/
  15. The Hacker News — “Five Malicious Rust Crates and AI Bot Exploit CI/CD Pipelines” https://thehackernews.com/2026/03/five-malicious-rust-crates-and-ai-bot.html
  16. AWS Docs — “AWS Distro for OpenTelemetry and AWS X-Ray” https://docs.aws.amazon.com/xray/latest/devguide/xray-services-adot.html
  17. terraform-aws-modules/terraform-aws-ecs — DeepWiki analysis https://deepwiki.com/terraform-aws-modules/terraform-aws-ecs
  18. terraform-aws-modules/terraform-aws-alb — DeepWiki analysis https://deepwiki.com/terraform-aws-modules/terraform-aws-alb
  19. terraform-aws-modules/terraform-aws-vpc — DeepWiki analysis https://deepwiki.com/terraform-aws-modules/terraform-aws-vpc
  20. AppSec Santa — “Trivy 2026: All-in-One Security Scanner” https://appsecsanta.com/trivy
  21. AppSec Santa — “Syft Review 2026: Open-Source SBOM Generator” https://appsecsanta.com/syft
  22. Rutagon — “Container Security in Production CI/CD” https://rutagon.com/insights/container-security-production-cicd/
  23. AWS Docs — “Pass sensitive data to an Amazon ECS container” https://docs.aws.amazon.com/AmazonECS/latest/developerguide/specifying-sensitive-data.html
  24. AppSec Santa — “Chainguard: Zero-CVE Container Images” https://appsecsanta.com/chainguard

  • Every recommendation has a health check
  • Every RESEARCH category has a comparison matrix (Docker Image Hardening, Secret Management)
  • Opinionated defaults are confirmed via search, not just assumed (Docker, GitHub Actions, OTEL, mise)
  • Dependency versions are current (verified via DeepWiki, PyPI, Tavily, Brave)
  • Sources are cited for all factual claims (24 sources)
  • Recommendations are coherent (all technologies work together on ECS Fargate)
  • User constraints respected: no Kubernetes, no LiteLLM, no heavy infra, all must-haves included
  • Conditional logic applied: service mesh skipped (single service), IaC comparison skipped (locked in)
  • License implications noted (Terraform BSL, hadolint GPL-3.0)
  • Container security tools included with wiring guidance
  • Trivy security incident flagged with mitigation guidance