tutorials

Intended Documentation

Go-Live Runbook

Production rollout checklist for Intended deployment — required controls, validation steps, and incident readiness.

intermediate4 min readimplemented

Overview#

This runbook provides the complete production rollout checklist for Intended. Follow these steps in order before enabling production traffic.

Danger

Do not skip validation steps. The fail-closed security model means misconfiguration results in denied traffic, not silent failures.

Pre-Deployment Validation#

Verify infrastructure health

Confirm all runtime services are healthy:

bash

intended health check --verbose --environment production

Expected output: all services report healthy. If any service reports degraded or unavailable, resolve before proceeding.

Validate policy set

Ensure the production policy set is validated and deployed:

bash

intended policy validate --all --environment production
intended policy list --environment production --status active

Confirm:

All policies pass validation
Active policy count matches expected
No stale or orphaned policies

Run operational readiness checks

bash

intended readiness check --environment production --verbose

This runs the full readiness suite:

Service connectivity
Key material availability
Audit pipeline lag
Policy engine response time
Token signing latency

Verify token signing

Issue a test token and verify it:

bash

intended token test --environment production

This submits a synthetic intent, receives a decision token, and verifies its signature locally.

Confirm audit pipeline

Verify audit events are being recorded:

bash

curl -H "Authorization: Bearer $Intended_API_KEY" \
  "https://api.intended.so/tenants/tenant_acme_prod/audit?limit=1"

Confirm the response returns recent events with correct timestamps.

Required Controls#

Before routing production traffic, confirm these controls are in place:

Emergency Controls#

[ ] Tenant-wide kill switch tested and accessible
[ ] Emergency token revocation procedure documented and rehearsed
[ ] Circuit breaker thresholds configured for production load

Access Controls#

[ ] Production API keys created with minimum required scopes
[ ] Development/staging keys do not have production access
[ ] Role assignments reviewed for least-privilege compliance

Monitoring#

[ ] Health endpoint monitoring configured (healthz, readyz)
[ ] Alerting configured for evaluation latency thresholds
[ ] Alerting configured for error rate spikes
[ ] Audit pipeline lag alerting configured

Rollback Plan#

[ ] Previous policy version identified for rollback
[ ] Rollback procedure tested in staging
[ ] Rollback authorization chain identified (who can approve)

Production Rollout Order#

Enable at low traffic

Route a small percentage of traffic (5-10%) to the Intended evaluation pipeline. Monitor:

Decision latency (p50, p95, p99)
Error rate
Allow/deny ratio

Validate steady state

After 30 minutes at low traffic:

Confirm latency is within expected bounds
Confirm error rate is below threshold
Review a sample of deny decisions for correctness

Increase to full traffic

Gradually increase to 100% over 2-4 hours:

5% → 25% → 50% → 100%
Monitor at each step before increasing

Post-launch validation

After 24 hours at full traffic:

Run the full readiness check suite again
Review audit log volume and completeness
Confirm no unexpected deny patterns
Archive the go-live evidence for compliance

Incident Readiness#

Before go-live, ensure the following are in place:

On-call rotation — at least one operator available 24/7 for the first week
Incident response runbook — reviewed and accessible (Incident Response)
Communication channel — dedicated channel for platform incidents
Escalation path — defined escalation from operator → engineering → leadership

Rollback Procedure#

If issues are detected after go-live:

bash

# Activate kill switch (stops all evaluations, defaults to deny)
intended emergency kill-switch activate --tenant $TENANT_ID --reason "go-live rollback"

# Roll back to previous policy version
intended policy rollback --to-version $PREVIOUS_VERSION --environment production

# Verify rollback
intended policy list --environment production --status active

# Deactivate kill switch when stable
intended emergency kill-switch deactivate --tenant $TENANT_ID

Next Steps#

Capability Truth Matrix — verify all subsystem statuses
Operational Readiness — detailed health check procedures
Emergency Controls — kill switch and circuit breaker reference
Incident Response — investigation procedures