LLM Gateway Integration

Wire any agent runtime through the Intended LLM gateway with one config line. Anthropic, OpenAI, NVIDIA NIM, AWS Bedrock, Google Vertex.

beginner4 min readimplemented

LLM Gateway Integration#

The Intended LLM gateway is a transparent HTTP proxy that sits between your agent runtime and the model provider (Anthropic, OpenAI, NVIDIA NIM, AWS Bedrock, Google Vertex). Every tool call the model wants to invoke gets evaluated against your Intended policies before execution.

The integration is one config line in your agent runtime. No SDK paste, no code changes inside agents.

Quickstart#

bash

# 1. Mint a tenant key from app.intended.so/settings → API Keys.
export INTENDED_KEY=intended_live_…
export INTENDED_TENANT_ID=<your-tenant-id>

# 2. Point your runtime's model API base URL at the gateway.
export ANTHROPIC_BASE_URL=https://gateway.intended.so/v1/anthropic
# or for OpenAI / OpenAI-compatible runtimes:
export OPENAI_BASE_URL=https://gateway.intended.so/v1/openai
# or for NVIDIA NIM physical-AI agents:
export OPENAI_BASE_URL=https://gateway.intended.so/v1/nvidia-nim

That's it. Restart your agent runtime; every tools/call flowing through the LLM is now governed.

What gets evaluated#

For each tool call the model produces, the gateway:

Buffers the streaming tool_use / tool_calls frames until the full tool name + args are available.
Calls the Intended authority engine with a structured intent: actor + tool name + args + risk profile.
Receives ALLOW / DENY / ESCALATE.
ALLOW → forwards the original frames to the agent unchanged.
DENY → replaces the tool call with a [INTENDED DENIED] reason=… text block the agent reads as a normal model response.
ESCALATE → replaces with [INTENDED ESCALATED] escalation=… reason=…; the human operator approves from the queue at app.intended.so/queue.

Provider coverage#

Provider	Path	Streaming	Non-streaming	Notes
Anthropic	`/v1/anthropic`	✓ SSE	✓ JSON	Native Messages API
OpenAI	`/v1/openai`	✓ SSE	✓ JSON	Chat completions
NVIDIA NIM	`/v1/nvidia-nim`	✓ SSE	✓ JSON	OpenAI-compatible; hosted + self-hosted
AWS Bedrock (Claude)	`/v1/bedrock-anthropic`	✓ event-stream	✓ JSON	Binary AWS event-stream codec
Google Vertex (Gemini)	`/v1/vertex-gemini`	✓ NDJSON	✓ JSON	`streamGenerateContent`

Trust model#

Customer keeps the model API key. The gateway forwards Authorization (or x-api-key for Anthropic) upstream untouched. Never stored, never logged.
Removable in one line. Revert the env var, restart the agent. We're gone. No SDK abandoned in code.
Source-available. github.com/intended-so/intended/tree/main/packages/llm-gateway. Read what we do or self-host in your VPC.
Audit-only first. Run with X-Intended-Mode: observe to see what we'd block before flipping enforce.

Headers#

Header from your runtime	Used for	Forwarded upstream?
`Authorization: Bearer <model-key>`	model auth	yes, untouched
`x-api-key: <model-key>`	model auth (Anthropic)	yes, untouched
`anthropic-version` / `openai-beta`	provider versioning	yes, untouched
`X-Intended-Key`	tenant identification	terminated at gateway
`X-Intended-Tenant-Id`	tenant identification	terminated at gateway
`X-Intended-Mode`	per-request observe / enforce override	terminated at gateway

Per-tenant configuration#

Mode (observe / enforce) and per-tool / per-actor policy overrides are stored in the tenant's OrganizationPreferences.gateway blob, managed from app.intended.so/gateway. Live config changes propagate to in-flight gateway pods within ~60 seconds.

Policy overrides#

Bypass the authority engine round-trip for tools or actors you want to handle deterministically:

json

{
  "gateway": {
    "policyOverrides": [
      { "scope": "tool", "match": "^get_", "directive": "auto_approve" },
      { "scope": "tool", "match": "^wire_transfer", "directive": "always_deny" },
      { "scope": "actor", "match": "^gateway:fundraising-", "directive": "always_escalate" }
    ]
  }
}

First-match-wins. match is a case-insensitive regex against the tool name (scope tool) or the actor id (scope actor).

Runtime helpers#

For OpenClaw / NeoClaw and any Node-based agent harness: install @intended/openclaw-plugin and call installIntendedGateway({ tenantId, intendedKey }) once at startup. The plugin sets the env vars + exposes a ConfiguredGateway object with per-provider URL accessors and a fetch wrapper.

For physical-AI runtimes (Isaac Sim, ROS2): the gateway covers the LLM-planning layer. Use @intended/intended-ros2's AuthorityTokenVerifier to gate actuation on a fresh authority token at the robot.

Observability#

The gateway exposes /healthz, /readyz, and /metrics (Prometheus). Key series:

intended_gateway_requests_total{provider, tenant}
intended_gateway_tool_calls_total{provider, tenant, decision}
intended_gateway_upstream_latency_ms_bucket
intended_gateway_decision_latency_ms_bucket
intended_gateway_upstream_errors_total{provider, tenant, status}

W3C traceparent headers are echoed back so downstream + upstream log lines correlate.

Self-hosting#

Same Docker image, same env vars, your VPC. Helm chart and Terraform module ship in the repo:

Helm: infrastructure/helm/llm-gateway/
Terraform (ECS Fargate): infrastructure/terraform/modules/llm-gateway/

Full runbook including troubleshooting and SLO suggestions: docs/self-host-gateway/README.md.

Compare vs alternatives#

For a feature matrix vs Cloudflare AI Gateway / Lakera Guard / Portkey / Wiz Runtime, see /compare/llm-gateway.