guides
Intended Documentation
LLM Gateway Integration
Wire any agent runtime through the Intended LLM gateway with one config line. Anthropic, OpenAI, NVIDIA NIM, AWS Bedrock, Google Vertex.
LLM Gateway Integration#
The Intended LLM gateway is a transparent HTTP proxy that sits between your agent runtime and the model provider (Anthropic, OpenAI, NVIDIA NIM, AWS Bedrock, Google Vertex). Every tool call the model wants to invoke gets evaluated against your Intended policies before execution.
The integration is one config line in your agent runtime. No SDK paste, no code changes inside agents.
Quickstart#
That's it. Restart your agent runtime; every tools/call flowing through the LLM is now governed.
What gets evaluated#
For each tool call the model produces, the gateway:
- Buffers the streaming
tool_use/tool_callsframes until the full tool name + args are available. - Calls the Intended authority engine with a structured intent: actor + tool name + args + risk profile.
- Receives ALLOW / DENY / ESCALATE.
- ALLOW → forwards the original frames to the agent unchanged.
- DENY → replaces the tool call with a
[INTENDED DENIED] reason=…text block the agent reads as a normal model response. - ESCALATE → replaces with
[INTENDED ESCALATED] escalation=… reason=…; the human operator approves from the queue atapp.intended.so/queue.
Provider coverage#
| Provider | Path | Streaming | Non-streaming | Notes |
|---|---|---|---|---|
| Anthropic | /v1/anthropic | ✓ SSE | ✓ JSON | Native Messages API |
| OpenAI | /v1/openai | ✓ SSE | ✓ JSON | Chat completions |
| NVIDIA NIM | /v1/nvidia-nim | ✓ SSE | ✓ JSON | OpenAI-compatible; hosted + self-hosted |
| AWS Bedrock (Claude) | /v1/bedrock-anthropic | ✓ event-stream | ✓ JSON | Binary AWS event-stream codec |
| Google Vertex (Gemini) | /v1/vertex-gemini | ✓ NDJSON | ✓ JSON | streamGenerateContent |
Trust model#
- Customer keeps the model API key. The gateway forwards
Authorization(orx-api-keyfor Anthropic) upstream untouched. Never stored, never logged. - Removable in one line. Revert the env var, restart the agent. We're gone. No SDK abandoned in code.
- Source-available. github.com/intended-so/intended/tree/main/packages/llm-gateway. Read what we do or self-host in your VPC.
- Audit-only first. Run with
X-Intended-Mode: observeto see what we'd block before flipping enforce.
Headers#
| Header from your runtime | Used for | Forwarded upstream? |
|---|---|---|
Authorization: Bearer <model-key> | model auth | yes, untouched |
x-api-key: <model-key> | model auth (Anthropic) | yes, untouched |
anthropic-version / openai-beta | provider versioning | yes, untouched |
X-Intended-Key | tenant identification | terminated at gateway |
X-Intended-Tenant-Id | tenant identification | terminated at gateway |
X-Intended-Mode | per-request observe / enforce override | terminated at gateway |
Per-tenant configuration#
Mode (observe / enforce) and per-tool / per-actor policy overrides are stored in the tenant's OrganizationPreferences.gateway blob, managed from app.intended.so/gateway. Live config changes propagate to in-flight gateway pods within ~60 seconds.
Policy overrides#
Bypass the authority engine round-trip for tools or actors you want to handle deterministically:
First-match-wins. match is a case-insensitive regex against the tool name (scope tool) or the actor id (scope actor).
Runtime helpers#
For OpenClaw / NeoClaw and any Node-based agent harness: install @intended/openclaw-plugin and call installIntendedGateway({ tenantId, intendedKey }) once at startup. The plugin sets the env vars + exposes a ConfiguredGateway object with per-provider URL accessors and a fetch wrapper.
For physical-AI runtimes (Isaac Sim, ROS2): the gateway covers the LLM-planning layer. Use @intended/intended-ros2's AuthorityTokenVerifier to gate actuation on a fresh authority token at the robot.
Observability#
The gateway exposes /healthz, /readyz, and /metrics (Prometheus). Key series:
intended_gateway_requests_total{provider, tenant}intended_gateway_tool_calls_total{provider, tenant, decision}intended_gateway_upstream_latency_ms_bucketintended_gateway_decision_latency_ms_bucketintended_gateway_upstream_errors_total{provider, tenant, status}
W3C traceparent headers are echoed back so downstream + upstream log lines correlate.
Self-hosting#
Same Docker image, same env vars, your VPC. Helm chart and Terraform module ship in the repo:
- Helm:
infrastructure/helm/llm-gateway/ - Terraform (ECS Fargate):
infrastructure/terraform/modules/llm-gateway/
Full runbook including troubleshooting and SLO suggestions: docs/self-host-gateway/README.md.
Compare vs alternatives#
For a feature matrix vs Cloudflare AI Gateway / Lakera Guard / Portkey / Wiz Runtime, see /compare/llm-gateway.