Infrastructure Domain -- Tech Stack Research Report
Project: LLM API Gateway (Portkey OSS) Date: 2026-03-18 Researcher Domain: Infrastructure Locked-In: Python 3.13, uv, ECS Fargate, ALB, Terraform
Table of Contents
Section titled “Table of Contents”- Containerization — Default Confirmed
- Orchestration — Skipped (Locked In)
- Infrastructure as Code — Skipped (Locked In) + Terraform Module Research
- Docker Image Hardening — Full Research
- CI/CD — Default Confirmed
- Observability — Default Confirmed + OTEL Collector Research
- Secret Management — Full Research
- Container Security Pipeline — Default Tools + Wiring Research
- Dev Tool Management — Default Confirmed
- Service Mesh — Skipped
- Domain-Specific Artifacts
- Compatibility Notes
- Sources
1. Containerization — Default Confirmed
Section titled “1. Containerization — Default Confirmed”Recommendation: Docker with BuildKit
Section titled “Recommendation: Docker with BuildKit”- Version: Docker Engine 27.x / BuildKit 0.17.0 (Q1 2026)
- Why: Docker with BuildKit is the universal container build standard. BuildKit is default since Docker 23.0+ and provides parallel stage execution, cache mounts, secret mounts, and SSH forwarding. For a Python/uv multi-stage build, BuildKit’s cache mount feature (
--mount=type=cache) dramatically speeds up dependency installation. - Health: HEALTHY
Docker / BuildKit — HEALTHY
Section titled “Docker / BuildKit — HEALTHY”- Version: BuildKit 0.17.0 (Q1 2026)
- Activity: Actively developed, part of Docker Engine and Moby project
- Maintainers: Docker Inc. + large open-source community
- Stars: 8k+ (moby/buildkit) | License: Apache 2.0
- Notes: Default builder in Docker Desktop and Engine 23.0+. The 0.17.0 release includes optimized backend for performance and memory usage, better isolation, and rootless execution support.
2. Orchestration — Skipped (Locked In)
Section titled “2. Orchestration — Skipped (Locked In)”ECS Fargate is locked in. No comparison needed. Kubernetes is on the avoid list.
Compatibility note: ECS Fargate integrates natively with ALB (also locked in) via target group IP mode. Terraform modules cover this pattern well (see Section 3).
3. Infrastructure as Code — Skipped (Locked In) + Terraform Module Research
Section titled “3. Infrastructure as Code — Skipped (Locked In) + Terraform Module Research”Terraform is locked in. No IaC comparison needed.
License note: Terraform switched from MPL 2.0 to BSL 1.1 (Business Source License) in August 2023. This restricts commercial competitors of HashiCorp from using it. For internal infrastructure use (which is this project’s case), there is no licensing impact. OpenTofu exists as an MPL-licensed fork if needed in the future. [1]
Terraform Module Research — Full Research
Section titled “Terraform Module Research — Full Research”For this ECS Fargate + ALB deployment, three community modules from terraform-aws-modules form the foundation.
terraform-aws-modules/terraform-aws-ecs — HEALTHY
Section titled “terraform-aws-modules/terraform-aws-ecs — HEALTHY”- Version: 5.12.1 (released 2025-04-18)
- Activity: Frequent releases, bug fixes and features through 2025
- Maintainers: Anton Babenko + terraform-aws-modules org (community-maintained, AWS-endorsed)
- Stars: 1,700+ | License: Apache 2.0
- Structure: Root module + 3 sub-modules:
cluster— ECS cluster + capacity providers + CloudWatch log groupsservice— ECS service + task definition + IAM roles + autoscaling + load balancer integrationcontainer-definition— Container properties, port mappings, env vars, logging
- Key features: Fargate-first design, built-in autoscaling, service connect, service discovery, ALB integration, CloudWatch logging, IAM role management
- Notes: The
servicesub-module defaults to FARGATE launch type and creates autoscaling by default. Thecontainer-definitionsub-module handles OTEL sidecar containers cleanly.
terraform-aws-modules/terraform-aws-alb — HEALTHY
Section titled “terraform-aws-modules/terraform-aws-alb — HEALTHY”- Version: 9.16.0 (released 2025-04-21)
- Activity: Very active, multiple releases in 2025
- Maintainers: Anton Babenko + terraform-aws-modules org
- Stars: 900+ | License: Apache 2.0
- Key features: ALB + NLB support, target groups with
create_attachment = falsefor ECS integration, HTTPS listeners, mutual TLS, connection logging, health checks, blue-green deployment support - ECS Integration: Create target groups without attachments; ECS registers task IPs automatically. This is the documented pattern in
docs/patterns.md. - Notes: Supports zonal shift, anomaly mitigation, trust stores, and HTTP response headers.
terraform-aws-modules/terraform-aws-vpc — HEALTHY
Section titled “terraform-aws-modules/terraform-aws-vpc — HEALTHY”- Version: 5.21.0 (released 2025-04-21)
- Activity: Very active, frequent releases
- Maintainers: Anton Babenko + terraform-aws-modules org
- Stars: 3,000+ | License: Apache 2.0
- Key features: Public/private subnets, NAT gateways, VPC endpoints (including ECS endpoints for private Fargate tasks), flow logs, multi-AZ
- Notes: For Fargate, use private subnets with NAT gateway for outbound internet access. VPC endpoints for ECR, S3, and CloudWatch Logs reduce NAT costs and improve latency.
Recommended Terraform Module Structure
Section titled “Recommended Terraform Module Structure”infra/ modules/ # Local wrappers if needed environments/ dev/ main.tf # Uses terraform-aws-modules/* variables.tf outputs.tf terraform.tfvars prod/ main.tf variables.tf outputs.tf terraform.tfvars shared/ ecr.tf # ECR repository (shared across envs) state.tf # S3 backend + DynamoDB lock tableModule pinning: Always pin modules to exact versions:
module "vpc" { source = "terraform-aws-modules/vpc/aws" version = "5.21.0"}
module "alb" { source = "terraform-aws-modules/alb/aws" version = "9.16.0"}
module "ecs" { source = "terraform-aws-modules/ecs/aws" version = "5.12.1"}4. Docker Image Hardening — Full Research
Section titled “4. Docker Image Hardening — Full Research”Recommendation: python:3.13-slim with multi-stage hardening (primary), with Chainguard upgrade path
Section titled “Recommendation: python:3.13-slim with multi-stage hardening (primary), with Chainguard upgrade path”Comparison Matrix
Section titled “Comparison Matrix”| Criteria (weight) | Chainguard python:latest | Google distroless/python3 | python:3.13-slim + hardening |
|---|---|---|---|
| Security posture (0.25) | 10/10 (zero CVEs) | 8/10 (low CVEs) | 6/10 (some CVEs, mitigated) |
| Python 3.13 support (0.20) | 7/10 (paid for 3.13 tag) | 3/10 (no 3.13) | 10/10 (native) |
| uv compatibility (0.15) | 8/10 (works with multi-stage) | 5/10 (complex setup) | 10/10 (native) |
| Image size (0.10) | 10/10 (~23 MB) | 8/10 (~50 MB) | 7/10 (~150 MB) |
| Debuggability (0.10) | 3/10 (no shell) | 2/10 (no shell) | 9/10 (has shell) |
| Free/no vendor lock (0.10) | 5/10 (free=latest only) | 10/10 (free) | 10/10 (free) |
| Team familiarity (0.10) | 5/10 | 4/10 | 10/10 |
| Weighted Score | 7.05 | 5.25 | 8.30 |
Analysis
Section titled “Analysis”Chainguard (cgr.dev/chainguard/python): Best security posture with zero CVEs, SBOM, and Sigstore signatures built in. Nightly automated rebuilds. However, the free tier only provides :latest and :latest-dev tags, which currently point to Python 3.14. The Python 3.13 tag (3.13, 3.13.12) requires a paid subscription (“Contact us for access”). Image is ~23 MB compressed. [2][3]
Google distroless (gcr.io/distroless/python3): No Python 3.13 support — the image is significantly behind on Python versions. Not actively maintained for Python specifically. Lacks the nightly rebuild cadence of Chainguard. Effectively deprecated for Python use cases. [4]
python:3.13-slim with hardening: The official Python slim image is ~150 MB but provides native Python 3.13 support, full uv compatibility, and easy debugging. With multi-stage builds, non-root user, and readonly filesystem, the security posture improves dramatically. The remaining CVEs are in OS packages that typically do not affect Python application code. [5]
Recommendation rationale
Section titled “Recommendation rationale”For a small team (2-5) that needs Python 3.13 specifically and uses uv, python:3.13-slim with a hardened multi-stage build is the pragmatic choice. It provides the best developer experience, zero friction with uv, and well-understood tooling. The security gap is closed through:
- Multi-stage builds (no build tools in production image)
- Non-root user
- Read-only filesystem where possible
- Trivy/Grype scanning in CI to catch and triage CVEs
Upgrade path: When the team is ready to invest in Chainguard’s paid tier for pinned Python 3.13 tags, the migration is straightforward — change the FROM line in the runtime stage. The multi-stage build pattern works identically with both.
Evidence
Section titled “Evidence”- Chainguard Python versions page: Python 3.13 tag requires paid access [3]
- Chainguard zero-CVE benchmarks show equivalent performance to upstream [6]
- Hynek Schlawack’s production-ready uv Docker guide is the definitive reference for multi-stage uv builds [7]
5. CI/CD — Default Confirmed
Section titled “5. CI/CD — Default Confirmed”Recommendation: GitHub Actions
Section titled “Recommendation: GitHub Actions”- Version: Current (actions/runner 2.x, ubuntu-latest)
- Why: Default for GitHub-hosted repos. Excellent ecosystem of security scanning actions (trivy-action, hadolint-action, cosign). Native OIDC for keyless signing with Sigstore. Free tier sufficient for small teams.
- Health: HEALTHY
GitHub Actions — HEALTHY
Section titled “GitHub Actions — HEALTHY”- Version: N/A (managed service, continuously updated)
- Activity: GitHub Ships updates continuously
- Maintainers: GitHub (Microsoft) — large dedicated team
- Stars: N/A (managed service) | License: Proprietary (free tier for public repos, included with GitHub plans)
- Notes: Native OIDC token support is critical for Cosign keyless signing in the container security pipeline.
id-token: writepermission enables this.
6. Observability — Default Confirmed + OTEL Collector Research
Section titled “6. Observability — Default Confirmed + OTEL Collector Research”Recommendation: OpenTelemetry Python SDK + AWS Distro for OpenTelemetry (ADOT) Collector sidecar
Section titled “Recommendation: OpenTelemetry Python SDK + AWS Distro for OpenTelemetry (ADOT) Collector sidecar”OpenTelemetry Python SDK — HEALTHY
Section titled “OpenTelemetry Python SDK — HEALTHY”- Version: 1.39.0 (released 2025-12-03), with SDK release on PyPI dated 2026-03-04
- Activity: Very active — weekly community meetings, frequent releases
- Maintainers: 3 maintainers (Aaron Abbott, Leighton Chen, Riccardo Magliocchetti) + 10 approvers
- Stars: 1,900+ | License: Apache 2.0
- Stability: Traces and Metrics are stable. Logs are under active development with breaking changes.
- Notes: Supports Python 3.9-3.14. The SDK provides auto-instrumentation for common frameworks. [8]
OTEL Collector on ECS Fargate — Sidecar Pattern
Section titled “OTEL Collector on ECS Fargate — Sidecar Pattern”ECS Fargate has no DaemonSet concept. The two deployment patterns are:
- Sidecar pattern (recommended): OTEL Collector runs as a second container in the same task definition. Application sends telemetry to
localhost:4317(gRPC) orlocalhost:4318(HTTP). [9] - Standalone service pattern: Collector runs as a separate ECS service. Application sends telemetry over the network. More complex, but allows centralized config.
Recommendation: Use the sidecar pattern with ADOT Collector (public.ecr.aws/aws-observability/aws-otel-collector).
Why ADOT over upstream OTEL Collector: ADOT is AWS’s production-tested, supported distribution. It comes pre-configured with AWS-specific exporters (X-Ray, CloudWatch, AMP). It is tested and validated against ECS. AWS provides official ECS task definition examples. [10]
Resource allocation: 128 MB RAM and 0.25 vCPU is sufficient for the sidecar collector in most cases. [9]
Collector config sends to:
- AWS X-Ray — distributed traces (native ECS integration)
- Amazon CloudWatch — metrics and logs
- Optionally: Amazon Managed Service for Prometheus for Grafana-based dashboards
7. Secret Management — Full Research
Section titled “7. Secret Management — Full Research”Recommendation: AWS Secrets Manager (for credentials) + SSM Parameter Store (for config)
Section titled “Recommendation: AWS Secrets Manager (for credentials) + SSM Parameter Store (for config)”Use both services for their respective strengths rather than picking one.
Comparison Matrix
Section titled “Comparison Matrix”| Criteria (weight) | AWS Secrets Manager | SSM Parameter Store | HashiCorp Vault |
|---|---|---|---|
| ECS integration (0.25) | 10/10 (native) | 10/10 (native) | 5/10 (sidecar needed) |
| Auto-rotation (0.20) | 10/10 (built-in RDS) | 3/10 (custom Lambda) | 8/10 (dynamic secrets) |
| Cost efficiency (0.15) | 5/10 ($0.40/secret/mo) | 9/10 ($0.05/10K calls) | 3/10 (self-managed) |
| Simplicity (0.15) | 9/10 | 10/10 | 3/10 (complex) |
| Cross-account (0.10) | 9/10 (native) | 7/10 (supported) | 8/10 |
| Versioning (0.10) | 9/10 (staging labels) | 8/10 (version history) | 7/10 |
| Avoid “heavy infra” (0.05) | 10/10 (managed) | 10/10 (managed) | 2/10 (self-managed) |
| Weighted Score | 8.45 | 7.80 | 4.75 |
Analysis
Section titled “Analysis”AWS Secrets Manager: Best for credentials that need rotation (database passwords, API keys to external LLM providers). Native ECS integration via secrets in container definitions — ECS Task Execution Role fetches secrets at task startup. Automatic rotation built in for RDS, Redshift, DocumentDB. Cross-region replication for DR. Cost is $0.40/secret/month + $0.05/10K API calls. [11][12]
SSM Parameter Store: Best for application configuration (feature flags, endpoint URLs, non-sensitive config). Free tier for standard parameters (up to 10,000). SecureString parameters use KMS encryption. Hierarchical organization (/prod/ai-gateway/model-config). Cost is $0.05/10K API calls for advanced parameters. [11][12]
HashiCorp Vault: Eliminated — violates the “avoid heavy/complex infra” constraint. Self-managed, requires its own cluster, adds operational burden for a small team.
Recommended Split
Section titled “Recommended Split”| Secret Type | Service | Example |
|---|---|---|
| LLM provider API keys | Secrets Manager | OpenAI key, Anthropic key |
| Database credentials | Secrets Manager | Aurora credentials (auto-rotated) |
| Internal service tokens | Secrets Manager | Inter-service auth tokens |
| Feature flags | Parameter Store | /prod/ai-gateway/enable-caching |
| Model routing config | Parameter Store | /prod/ai-gateway/default-model |
| Non-sensitive endpoint URLs | Parameter Store | /prod/ai-gateway/upstream-url |
ECS Task Definition Integration
Section titled “ECS Task Definition Integration”Both services inject natively into ECS containers:
{ "containerDefinitions": [{ "secrets": [ { "name": "OPENAI_API_KEY", "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:openai-key" }, { "name": "MODEL_CONFIG", "valueFrom": "arn:aws:ssm:us-east-1:123456789:parameter/prod/ai-gateway/model-config" } ] }]}Secrets are fetched by the ECS agent at task startup using the Task Execution Role. They are NOT baked into the image or task definition. Secret rotation requires a new task deployment (ECS rolling update). [13]
8. Container Security Pipeline — Default Tools + Wiring Research
Section titled “8. Container Security Pipeline — Default Tools + Wiring Research”All six tools from the defaults table are confirmed healthy and recommended. Here is how they wire together in the correct order.
Pipeline Execution Order
Section titled “Pipeline Execution Order”Phase 1: PRE-BUILD (on every PR) 1. hadolint -- Lint Dockerfile for best practices 2. checkov -- Scan Terraform/IaC for misconfigurations
Phase 2: POST-BUILD (after docker build) 3. trivy image -- Scan image for vulns + misconfigs + secrets 4. grype -- Secondary scan with EPSS/KEV risk scoring 5. syft -- Generate SBOM (CycloneDX + SPDX)
Phase 3: POST-SCAN (before push to registry) 6. cosign -- Keyless sign the image with Sigstore OIDCWhy this order matters
Section titled “Why this order matters”- hadolint first: Catches Dockerfile issues before building. Fast feedback. No image needed.
- checkov alongside hadolint: Scans IaC files in the PR. Also scans Dockerfiles for additional checks.
- trivy before grype: Trivy is the broadest scanner (vulns + misconfig + secrets + licenses). Grype provides complementary risk scoring with EPSS (Exploit Prediction Scoring System) and KEV (Known Exploited Vulnerabilities catalog) data that trivy does not emphasize.
- syft after scanning: Generate the SBOM from the scanned image. The SBOM documents what was scanned and shipped.
- cosign last: Sign only after all gates pass. The signature attests “this image passed all security checks in this pipeline.” Uses keyless OIDC signing tied to the GitHub Actions workflow identity.
Tool Health Checks
Section titled “Tool Health Checks”hadolint — CAUTION
Section titled “hadolint — CAUTION”- Version: 3.0.0 (released ~2025)
- Activity: Less frequent releases, but stable
- Maintainers: 1 primary (Lukas Martinelli)
- Stars: 10k+ | License: GPL-3.0
- Issues: Single maintainer (bus factor of 1). However, the tool is mature and stable — Dockerfile best practices do not change rapidly.
- Notes: GPL-3.0 license applies to the tool binary only, not to your Dockerfiles. No licensing concern for CI usage.
trivy — CAUTION
Section titled “trivy — CAUTION”- Version: 0.69.2 (March 2026, post-incident hotfix)
- Activity: Very active — Aqua Security funded
- Maintainers: Aqua Security team (corporate-backed)
- Stars: 31k+ | License: Apache 2.0
- CRITICAL NOTE: In late February / early March 2026, Trivy experienced a supply chain attack (CVE-2026-28353). An AI-powered bot exploited a
pull_request_targetmisconfiguration in GitHub Actions, stole a PAT, deleted 178 releases, and pushed a malicious VS Code extension. The core Trivy CLI binary was NOT compromised, but release assets from the affected window (Feb 27 - Mar 1) cannot be verified. Aqua republished v0.69.2 with clean assets. [14][15] - Mitigation: Always verify Trivy installation via cosign signatures. Use the official
aquasecurity/trivy-actionGitHub Action rather thanget.trivy.devinstall script. Pin to specific versions in CI.
checkov — HEALTHY
Section titled “checkov — HEALTHY”- Version: 3.2.382 (released 2026-03-06)
- Activity: Extremely active — multiple releases per week
- Maintainers: Prisma Cloud (Palo Alto Networks) — corporate-backed
- Stars: 7k+ | License: Apache 2.0
- Notes: 1000+ built-in policies. Graph-based cross-resource analysis for Terraform. Also scans Dockerfiles (CKV_DOCKER checks).
grype — HEALTHY
Section titled “grype — HEALTHY”- Version: Latest release available on GitHub (active releases through 2026)
- Activity: Active, automated release pipeline
- Maintainers: Anchore Inc. — corporate-backed
- Stars: 9k+ | License: Apache 2.0
- Notes: Uses Syft internally for SBOM generation. Provides composite CVSS + EPSS + KEV risk scoring that trivy lacks.
syft — HEALTHY
Section titled “syft — HEALTHY”- Version: 1.42.0 (2026)
- Activity: Active development by Anchore
- Maintainers: Anchore Inc. — corporate-backed, 219+ contributors
- Stars: 8.4k+ | License: Apache 2.0
- Notes: Supports CycloneDX 1.6 and SPDX 2.3 output. Pairs with Grype for scan-from-SBOM workflow. Supports signed SBOM attestations via in-toto.
cosign — HEALTHY
Section titled “cosign — HEALTHY”- Version: 2.4.3
- Activity: Active development
- Maintainers: Sigstore project (Linux Foundation), 7+ regular contributors
- Stars: 4k+ | License: Apache 2.0
- Notes: Keyless signing via Fulcio CA + Rekor transparency log. GitHub Actions OIDC integration is first-class. Future development moving to
sigstore-gofor a major version. v2.x remains stable and supported.
Tool Overlap and Differentiation
Section titled “Tool Overlap and Differentiation”| Capability | hadolint | trivy | grype | checkov | syft | cosign |
|---|---|---|---|---|---|---|
| Dockerfile linting | YES | — | — | partial | — | — |
| Image vuln scanning | — | YES | YES | — | — | — |
| Image misconfig scanning | — | YES | — | — | — | — |
| Image secret detection | — | YES | — | — | — | — |
| IaC scanning | — | YES | — | YES | — | — |
| SBOM generation | — | YES | — | — | YES | — |
| EPSS/KEV risk scoring | — | — | YES | — | — | — |
| Image signing | — | — | — | — | — | YES |
| License compliance | — | YES | — | — | YES | — |
Why both trivy AND grype? Trivy is the broadest single tool, but Grype’s EPSS/KEV scoring provides better prioritization of which vulns actually matter. Running both catches edge cases where one scanner’s vulnerability database is ahead of the other. The overhead is minimal in CI.
9. Dev Tool Management — Default Confirmed
Section titled “9. Dev Tool Management — Default Confirmed”Recommendation: mise
Section titled “Recommendation: mise”- Version: 2026.3.9 (released 2026-03-13)
- Why: Default for the team. Manages Python versions, tool versions, environment variables, and project tasks. Pairs with uv for a complete Python dev environment.
- Health: HEALTHY
mise — HEALTHY
Section titled “mise — HEALTHY”- Version: 2026.3.9 (released 2026-03-13)
- Activity: Extremely active — multiple releases per month (CalVer)
- Maintainers: jdx (Jeff Dickey) + community contributors
- Stars: 12k+ | License: MIT
- Notes: CalVer versioning reflects release dates. Very healthy contribution pattern with new contributors each release.
10. Service Mesh — Skipped
Section titled “10. Service Mesh — Skipped”Single service / monolith (Portkey OSS proxy). No microservices architecture. Service mesh is not applicable per the domain conditional logic.
11. Domain-Specific Artifacts
Section titled “11. Domain-Specific Artifacts”11.1 Dockerfile Skeleton
Section titled “11.1 Dockerfile Skeleton”# =============================================================================# Stage 1: Builder -- installs dependencies with full build tools# =============================================================================FROM python:3.13-slim AS builder
# Install uv from the official imageCOPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
# Configure uv for container buildsENV UV_LINK_MODE=copy \ UV_COMPILE_BYTECODE=1 \ UV_PYTHON_DOWNLOADS=never
WORKDIR /app
# Install dependencies first (cached layer)COPY pyproject.toml uv.lock ./RUN uv sync --no-dev --locked --no-editable --no-install-project
# Copy source code and install the projectCOPY src/ ./src/COPY README.md ./RUN uv sync --no-dev --locked --no-editable
# =============================================================================# Stage 2: Runtime -- minimal production image# =============================================================================FROM python:3.13-slim AS runtime
# Security: install only runtime OS deps, clean upRUN apt-get update && \ apt-get install -y --no-install-recommends \ ca-certificates \ tini \ && rm -rf /var/lib/apt/lists/* \ && apt-get purge -y --auto-remove
# Security: create non-root userRUN groupadd --gid 1001 appuser && \ useradd --uid 1001 --gid 1001 --shell /usr/sbin/nologin --create-home appuser
WORKDIR /app
# Copy only the virtual environment from builderCOPY --from=builder --chown=appuser:appuser /app/.venv /app/.venv
# Copy application sourceCOPY --from=builder --chown=appuser:appuser /app/src /app/src
# Set PATH to use the venvENV PATH="/app/.venv/bin:$PATH" \ PYTHONUNBUFFERED=1 \ PYTHONDONTWRITEBYTECODE=1
# Security: switch to non-root userUSER appuser
# Health check for ECSHEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \ CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8787/health')" || exit 1
# Use tini as init process (reaps zombies, forwards signals)ENTRYPOINT ["tini", "--"]
# Portkey gateway default portEXPOSE 8787CMD ["python", "-m", "portkey_gateway"]11.2 mise.toml
Section titled “11.2 mise.toml”[tools]python = "3.13"
[env]_.python.venv = { path = ".venv", create = true }
[tasks]install = "uv sync"dev = "uv run python -m portkey_gateway"test = "uv run pytest tests/"lint = "uvx ruff check ."format = "uvx ruff format ."typecheck = "uvx pyright"
# Container tasksbuild = "docker build -t ai-gateway:dev ."scan = "trivy image --severity HIGH,CRITICAL ai-gateway:dev"lint-docker = "hadolint Dockerfile"sbom = "syft ai-gateway:dev -o cyclonedx-json > sbom.json"
# IaC taskstf-plan = { run = "terraform plan", dir = "infra/environments/dev" }tf-apply = { run = "terraform apply", dir = "infra/environments/dev" }tf-lint = "checkov -d infra/ --framework terraform"11.3 CI/CD Workflow Skeleton
Section titled “11.3 CI/CD Workflow Skeleton”name: CI/CD Pipeline
on: push: branches: [main] pull_request: branches: [main]
env: REGISTRY: <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com IMAGE_NAME: ai-gateway AWS_REGION: us-east-1
permissions: contents: read id-token: write # Required for cosign keyless signing security-events: write # Required for SARIF upload
jobs: # ===== Phase 1: Pre-Build Checks ===== lint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Lint Dockerfile uses: hadolint/hadolint-action@v3.1.0 with: dockerfile: Dockerfile failure-threshold: warning
- name: Scan IaC with Checkov uses: bridgecrewio/checkov-action@v12 with: directory: infra/ framework: terraform soft_fail: false
# ===== Phase 2: Build + Security Scan ===== build-and-scan: runs-on: ubuntu-latest needs: lint steps: - uses: actions/checkout@v4
- name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: ${{ secrets.AWS_ROLE_ARN }} aws-region: ${{ env.AWS_REGION }}
- name: Login to ECR uses: aws-actions/amazon-ecr-login@v2
- name: Build image run: | docker build \ -t ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \ -t ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest \ .
- name: Trivy vulnerability scan uses: aquasecurity/trivy-action@0.28.0 with: image-ref: "${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}" format: sarif output: trivy-results.sarif severity: CRITICAL,HIGH exit-code: 1
- name: Upload Trivy SARIF if: always() uses: github/codeql-action/upload-sarif@v3 with: sarif_file: trivy-results.sarif
- name: Grype vulnerability scan uses: anchore/scan-action@v4 with: image: "${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}" fail-build: true severity-cutoff: high
- name: Generate SBOM with Syft uses: anchore/sbom-action@v0 with: image: "${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}" format: cyclonedx-json output-file: sbom.cyclonedx.json
- name: Upload SBOM artifact uses: actions/upload-artifact@v4 with: name: sbom path: sbom.cyclonedx.json
# ===== Phase 3: Sign + Push ===== - name: Push image to ECR if: github.ref == 'refs/heads/main' run: | docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
- name: Sign image with Cosign if: github.ref == 'refs/heads/main' uses: sigstore/cosign-installer@v3 - run: | cosign sign --yes \ ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
# ===== Phase 4: Deploy ===== deploy: runs-on: ubuntu-latest needs: build-and-scan if: github.ref == 'refs/heads/main' environment: production steps: - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: ${{ secrets.AWS_ROLE_ARN }} aws-region: ${{ env.AWS_REGION }}
- name: Deploy to ECS run: | aws ecs update-service \ --cluster ai-gateway \ --service ai-gateway \ --force-new-deployment11.4 OTEL Collector Sidecar — Terraform Snippet
Section titled “11.4 OTEL Collector Sidecar — Terraform Snippet”# Within the ECS service module container_definitionsmodule "ecs_service" { source = "terraform-aws-modules/ecs/aws//modules/service" version = "5.12.1"
name = "ai-gateway" cluster_arn = module.ecs_cluster.arn
container_definitions = { # Main application container ai-gateway = { essential = true image = "${aws_ecr_repository.ai_gateway.repository_url}:latest" port_mappings = [{ containerPort = 8787 protocol = "tcp" }] environment = [ { name = "OTEL_EXPORTER_OTLP_ENDPOINT", value = "http://localhost:4317" }, { name = "OTEL_SERVICE_NAME", value = "ai-gateway" }, ] secrets = [ { name = "OPENAI_API_KEY", valueFrom = aws_secretsmanager_secret.openai_key.arn }, ] }
# OTEL Collector sidecar otel-collector = { essential = true image = "public.ecr.aws/aws-observability/aws-otel-collector:latest" cpu = 256 # 0.25 vCPU memory = 128 # 128 MB
environment = [ { name = "AOT_CONFIG_CONTENT", value = file("${path.module}/otel-collector-config.yaml") }, ] } }
# Task-level resources cpu = 1024 # 1 vCPU total memory = 2048 # 2 GB total
# ... ALB target group, IAM, etc.}12. Compatibility Notes
Section titled “12. Compatibility Notes”-
Terraform BSL 1.1: No impact for internal use. If the project is ever open-sourced as a reusable infrastructure module, consider OpenTofu as an alternative. The terraform-aws-modules are Apache 2.0 licensed and work with both Terraform and OpenTofu.
-
Python 3.13 + uv + Docker: The multi-stage Dockerfile uses
python:3.13-slimas both builder and runtime base. uv is copied fromghcr.io/astral-sh/uv:latestinto the builder stage only. The runtime stage contains only the .venv with compiled bytecode. -
ADOT Collector + Portkey OSS: Portkey gateway needs to be instrumented with OpenTelemetry Python SDK. The auto-instrumentation agent (
opentelemetry-instrument) can wrap the Portkey process. OTLP exporter sends tolocalhost:4317where the ADOT sidecar listens. -
Secrets Manager + ECS Fargate: Secrets are injected as environment variables at task startup. If an LLM provider key rotates, a new ECS deployment is needed (or use the S3 sidecar pattern for hot-reload). For most use cases, the environment variable injection is sufficient.
-
Trivy post-incident (CVE-2026-28353): Pin trivy-action to a specific version (
@0.28.0or later) rather than@master. Verify the action’s provenance. The CLI tool itself was not compromised — the attack vector was the VS Code extension and release assets. -
hadolint GPL-3.0: This license applies to the hadolint binary. Running it as a CI tool does not impose GPL obligations on your codebase. It lints Dockerfiles; it does not link with your application.
13. Sources
Section titled “13. Sources”- Spacelift — “Terraform License Change (BSL) — Impact on Users and Providers” https://spacelift.io/blog/terraform-license-change
- Chainguard — “Best Python Docker image: Top options compared” https://www.chainguard.dev/supply-chain-security-101/best-python-docker-image-top-options-compared
- Chainguard — Python Container Image Versions https://images.chainguard.dev/directory/image/python/versions
- OneUptime — “How to Build Minimal Container Images with Distroless and Chainguard” https://oneuptime.com/blog/post/2026-02-09-distroless-chainguard-minimal-images/view
- Hynek Schlawack — “Production-ready Python Docker Containers with uv” https://hynek.me/articles/docker-uv/
- Chainguard — “Zero CVEs and just as fast: Python and Go Images” https://chainguard.dev/unchained/zero-cves-and-just-as-fast-chainguards-python-go-images
- Digon.IO — “Build Multistage Python Docker Images Using UV” https://digon.io/en/blog/2025_07_28_python_docker_images_with_uv
- OpenTelemetry Python SDK — PyPI https://pypi.org/project/opentelemetry-sdk/
- OneUptime — “How to Configure OpenTelemetry for AWS ECS with Sidecar Collector” https://oneuptime.com/blog/post/2026-02-06-opentelemetry-aws-ecs-sidecar-collector/view
- AWS — “Deployment patterns for the ADOT Collector with Amazon ECS” https://aws.amazon.com/blogs/opensource/deployment-patterns-for-the-aws-distro-for-opentelemetry-collector-with-amazon-elastic-container-service/
- Cloud Kiln — “Managing Secrets in ECS: Parameter Store vs. Secrets Manager” https://cloudkiln.com/blog/ecs-secrets-management
- Doppler — “AWS Secrets Manager vs. Parameter Store” https://www.doppler.com/guides/aws-guides/aws-secrets-manager-vs-parameter-store
- AWS Docs — “Best practices for secrets management in Amazon ECS” https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security-secrets-management.html
- Reddit r/devops — “CVE-2026-28353 the Trivy security incident” https://www.reddit.com/r/devops/comments/1rqmrhi/ve202628353_the_trivy_security_incident_nobody_is/
- The Hacker News — “Five Malicious Rust Crates and AI Bot Exploit CI/CD Pipelines” https://thehackernews.com/2026/03/five-malicious-rust-crates-and-ai-bot.html
- AWS Docs — “AWS Distro for OpenTelemetry and AWS X-Ray” https://docs.aws.amazon.com/xray/latest/devguide/xray-services-adot.html
- terraform-aws-modules/terraform-aws-ecs — DeepWiki analysis https://deepwiki.com/terraform-aws-modules/terraform-aws-ecs
- terraform-aws-modules/terraform-aws-alb — DeepWiki analysis https://deepwiki.com/terraform-aws-modules/terraform-aws-alb
- terraform-aws-modules/terraform-aws-vpc — DeepWiki analysis https://deepwiki.com/terraform-aws-modules/terraform-aws-vpc
- AppSec Santa — “Trivy 2026: All-in-One Security Scanner” https://appsecsanta.com/trivy
- AppSec Santa — “Syft Review 2026: Open-Source SBOM Generator” https://appsecsanta.com/syft
- Rutagon — “Container Security in Production CI/CD” https://rutagon.com/insights/container-security-production-cicd/
- AWS Docs — “Pass sensitive data to an Amazon ECS container” https://docs.aws.amazon.com/AmazonECS/latest/developerguide/specifying-sensitive-data.html
- AppSec Santa — “Chainguard: Zero-CVE Container Images” https://appsecsanta.com/chainguard
Quality Checklist
Section titled “Quality Checklist”- Every recommendation has a health check
- Every RESEARCH category has a comparison matrix (Docker Image Hardening, Secret Management)
- Opinionated defaults are confirmed via search, not just assumed (Docker, GitHub Actions, OTEL, mise)
- Dependency versions are current (verified via DeepWiki, PyPI, Tavily, Brave)
- Sources are cited for all factual claims (24 sources)
- Recommendations are coherent (all technologies work together on ECS Fargate)
- User constraints respected: no Kubernetes, no LiteLLM, no heavy infra, all must-haves included
- Conditional logic applied: service mesh skipped (single service), IaC comparison skipped (locked in)
- License implications noted (Terraform BSL, hadolint GPL-3.0)
- Container security tools included with wiring guidance
- Trivy security incident flagged with mitigation guidance