2026-03-19

How We Scored 92/100 on Our Own Security Audit

Intended Team · Security Engineering

How We Scored 92/100 on Our Own Security Audit

Building a security platform means holding yourself to the standard you impose on others. So before we shipped Intended to production, we did something uncomfortable: we turned five adversarial AI agents loose on our own codebase and infrastructure, gave them every advantage, and documented everything they found.

The final score was 92 out of 100. Not perfect. The 8 points we lost taught us more than the 92 we earned. This post is a full transparency report on what our red team found, what we fixed, and what we learned along the way.

The Red Team Setup

We designed five specialized adversarial agents, each targeting a different attack surface.

Agent 1: Authentication and session management. Targeted login flows, token handling, session fixation, and credential storage.
Agent 2: Authorization and access control. Attempted privilege escalation, permission bypass, IDOR vulnerabilities, and policy circumvention.
Agent 3: Injection and input validation. Tested every input surface for SQL injection, command injection, XSS, SSRF, and template injection.
Agent 4: Cryptographic integrity. Probed token signing, key rotation, replay attacks, and timing side channels.
Agent 5: Infrastructure and configuration. Examined dependency supply chain, secret management, rate limiting, and deployment security.

Each agent ran autonomously for 72 hours against a full staging deployment that mirrored production. They had access to our source code, API documentation, and architecture diagrams. This was a white-box assessment with no holds barred.

What They Found

The agents identified 24 issues total: 3 CRITICAL, 7 HIGH, and 14 MEDIUM severity.

The 3 Critical Findings

The first critical finding was a hardcoded fallback secret in our JWT signing configuration. During early development, a fallback secret had been added for local testing. It was never meant to reach production, but it survived in a conditional branch that would activate if the environment variable was missing. In theory, a misconfigured deployment could fall back to a known, guessable signing key. We fixed this by removing the fallback entirely and making the application refuse to start if the signing key is not present in the environment.

The second critical finding was dead code in our brute-force protection layer. We had implemented account lockout logic, but a refactor had moved the check to a location after the authentication response was already being prepared. The lockout counter was incrementing correctly, but the actual blocking logic was never executing. The protection existed in name only. We restructured the middleware to enforce lockout checks before any authentication logic runs.

The third critical finding was a missing permission guard on an internal administration endpoint. One of our backoffice API routes for managing organization settings had been added without the standard authorization middleware. Any authenticated user could, in theory, modify organization-level settings they should not have access to. We added the permission guard, wrote regression tests for every admin route, and added a CI check that flags any new route without explicit authorization middleware.

The 7 High Severity Findings

The high-severity issues included: missing rate limiting on two API endpoints, an overly broad CORS configuration in the staging environment, insufficient input validation on webhook URL registration, a session token that did not rotate after privilege elevation, missing Content-Security-Policy headers on two response paths, an API key that was being logged at DEBUG level, and a dependency with a known CVE that had not been updated.

Each of these was fixable. None required architectural changes. But any one of them could have been exploited by a determined attacker to gain a foothold or escalate access.

The 14 Medium Severity Findings

The medium-severity issues were primarily defense-in-depth gaps: missing security headers, verbose error messages that disclosed internal implementation details, cookies without optimal flag combinations, and similar hardening items. Important for a production security posture, but not directly exploitable on their own.

What Passed with Flying Colors

The adversarial agents spent significant effort on several attack classes and found nothing exploitable.

No SQL injection anywhere. Our data access layer uses parameterized queries exclusively through the Prisma ORM. The agents tested every input surface, including nested JSON fields and array parameters, and could not construct any payload that reached the database unparameterized.

No command injection. Intended does not shell out to system commands anywhere in the runtime path. There is simply no injection surface for OS-level command execution.

No cross-site scripting. Our API is headless, returning JSON only. The console application uses React with automatic escaping. The agents could not find any path to inject executable content into rendered output.

Atomic token replay protection worked as designed. The agents attempted to replay Authority Decision Tokens across multiple requests, at different timestamps, and with modified payloads. The nonce-based replay protection rejected every attempt. The cryptographic signature verification caught every tampering attempt, including sophisticated partial-modification attacks.

Policy engine determinism held. The agents attempted to find inputs that would produce inconsistent policy evaluations, running the same intent through the engine thousands of times with identical and near-identical parameters. Every evaluation was deterministic. The same input always produced the same output.

The Hardest Bugs

The three critical findings share a common pattern that is worth examining. None of them were architectural flaws. They were all artifacts of the development process: a testing convenience that persisted, a refactor that broke an invariant, and a new endpoint that skipped a step.

This is the most dangerous category of security bug. The system was designed correctly. The implementation drifted. And the drift was invisible to normal testing because the happy path still worked. You only discover these bugs when you specifically look for them, or when an attacker does.

This is why we believe adversarial testing should be a standard part of every security product's release process, not a one-time penetration test, but continuous red team evaluation that runs against every significant release.

Lessons for Any Engineering Team

If you are building security infrastructure, or any system where correctness matters, here are the concrete lessons we took away from this process.

First, never allow fallback values for security-critical configuration. If a signing key, encryption key, or authentication secret is missing, the application should crash on startup, loudly and immediately. A fallback value is a backdoor waiting to happen.

Second, treat dead code as a security vulnerability. Code that exists but does not execute is worse than code that was never written. It creates false confidence. Someone looks at the lockout logic and believes the system is protected. Audit your code paths, not just your code.

Third, enforce authorization at the framework level, not the route level. If every route must individually remember to add authorization middleware, someone will forget. Instead, make authorization the default, and require explicit opt-out for the rare public endpoint.

Fourth, make red teaming repeatable. We built our adversarial agents to run against any deployment, not just the one we audited. They now run as part of our pre-release process. The cost of running them is trivial compared to the cost of shipping a critical vulnerability.

Fifth, publish your findings. Security through obscurity does not work. Transparency about your security posture, including your failures, builds more trust than claiming perfection.

The Path to 100

We do not expect to reach 100 out of 100. A perfect score would mean our red team agents are not creative enough. We plan to make our adversarial agents smarter, give them new attack techniques, and run them more frequently. The goal is not perfection. The goal is continuous improvement with full transparency.

The complete red team report, including detailed technical descriptions of every finding and its remediation, is available on our security page. If you are evaluating Intended for your organization, we encourage you to read it. And if you find something we missed, we want to hear about it.