tutorials
Intended Documentation
Go-Live Runbook
Production rollout checklist for Intended deployment — required controls, validation steps, and incident readiness.
Overview#
This runbook provides the complete production rollout checklist for Intended. Follow these steps in order before enabling production traffic.
Danger
Do not skip validation steps. The fail-closed security model means misconfiguration results in denied traffic, not silent failures.
Pre-Deployment Validation#
Verify infrastructure health
Confirm all runtime services are healthy:
Expected output: all services report healthy. If any service reports degraded or unavailable, resolve before proceeding.
Validate policy set
Ensure the production policy set is validated and deployed:
Confirm:
- All policies pass validation
- Active policy count matches expected
- No stale or orphaned policies
Run operational readiness checks
This runs the full readiness suite:
- Service connectivity
- Key material availability
- Audit pipeline lag
- Policy engine response time
- Token signing latency
Verify token signing
Issue a test token and verify it:
This submits a synthetic intent, receives a decision token, and verifies its signature locally.
Confirm audit pipeline
Verify audit events are being recorded:
Confirm the response returns recent events with correct timestamps.
Required Controls#
Before routing production traffic, confirm these controls are in place:
Emergency Controls#
- [ ] Tenant-wide kill switch tested and accessible
- [ ] Emergency token revocation procedure documented and rehearsed
- [ ] Circuit breaker thresholds configured for production load
Access Controls#
- [ ] Production API keys created with minimum required scopes
- [ ] Development/staging keys do not have production access
- [ ] Role assignments reviewed for least-privilege compliance
Monitoring#
- [ ] Health endpoint monitoring configured (healthz, readyz)
- [ ] Alerting configured for evaluation latency thresholds
- [ ] Alerting configured for error rate spikes
- [ ] Audit pipeline lag alerting configured
Rollback Plan#
- [ ] Previous policy version identified for rollback
- [ ] Rollback procedure tested in staging
- [ ] Rollback authorization chain identified (who can approve)
Production Rollout Order#
Enable at low traffic
Route a small percentage of traffic (5-10%) to the Intended evaluation pipeline. Monitor:
- Decision latency (p50, p95, p99)
- Error rate
- Allow/deny ratio
Validate steady state
After 30 minutes at low traffic:
- Confirm latency is within expected bounds
- Confirm error rate is below threshold
- Review a sample of deny decisions for correctness
Increase to full traffic
Gradually increase to 100% over 2-4 hours:
- 5% → 25% → 50% → 100%
- Monitor at each step before increasing
Post-launch validation
After 24 hours at full traffic:
- Run the full readiness check suite again
- Review audit log volume and completeness
- Confirm no unexpected deny patterns
- Archive the go-live evidence for compliance
Incident Readiness#
Before go-live, ensure the following are in place:
- On-call rotation — at least one operator available 24/7 for the first week
- Incident response runbook — reviewed and accessible (Incident Response)
- Communication channel — dedicated channel for platform incidents
- Escalation path — defined escalation from operator → engineering → leadership
Rollback Procedure#
If issues are detected after go-live:
Next Steps#
- Capability Truth Matrix — verify all subsystem statuses
- Operational Readiness — detailed health check procedures
- Emergency Controls — kill switch and circuit breaker reference
- Incident Response — investigation procedures