The Blog Post, as a Tool
After writing the 'Designing Backends for Agents' post, the obvious next move was to turn its thesis into something a reader could use on their own API. Most teams I've talked to know their spec isn't quite agent-ready but can't point to the specific gaps. A deterministic linter is the shortest path from that feeling to a concrete punch list.
The constraint was strict: no AI in the tool itself. The whole point of the blog is that good agent ergonomics is an architectural discipline, not a model capability. Using an LLM to evaluate API specs would have undercut the argument.
Ten Rules, Each Worth Some Points
Paste an OpenAPI spec — JSON or YAML — and a rule engine scores it out of 100 across ten weighted checks: error shape, retryability, idempotency on writes, operation descriptions, verb-led names, parameter descriptions, enum documentation, pagination, rate limits, and per-operation auth scopes.
Each rule is a standalone function that walks the spec, produces findings with actionable messages, and returns a pass/fail plus a set of offending paths. The total score is the sum of passed weights over the total, giving you a single number that moves when you actually fix things.
Paste → Parse → Score → Drill-In
Split-pane layout. Paste on the left, report on the right. Score at the top of the report, followed by one row per rule — passing rules collapsed, failing rules expanded with rationale and per-endpoint findings.
Textarea accepting JSON or YAML, with a 'Load sample spec' button wired to a realistic OpenAPI 3.0 document.
Detects JSON vs YAML by first character, parses with js-yaml. Surfaces parse errors inline rather than crashing the UI.
Array of rule objects. Each rule has id, title, weight, rationale, and a check(spec) function returning { pass, findings }.
Total computed from passed weights. Each rule rendered as a collapsible row; failing rules expand by default with rationale and findings.
All parsing, scoring, and rendering is client-side. The textarea contents never leave the browser.
Under the Hood
Ten rules in a single file, each ~30 lines. New rules plug in by adding an entry to the array — weight, title, rationale, check function. Refactor cost for new rules is near zero.
Only dep needed at runtime (~30KB gzipped). Imported inside the linter page component so Gatsby's route-based code splitting keeps it out of the homepage bundle.
Lives at /agent-linter as a standalone page. Heavy work stays in its own chunk; the rest of the site is unaffected.
No randomness, no LLM. Same spec always produces the same score, which is the entire point — agent-readiness is a contract-design question, not a modeling one.
What Made It Hard
- Writing heuristics that catch real problems without triggering on well-designed specs. The 'operations are verb-led' rule originally flagged 'GET /users' because 'users' isn't a verb — loosened it to pass as long as the operationId or summary itself starts with a verb.
- Detecting idempotency support without a perfect signal. Settled on a dual check: a header parameter named Idempotency-Key, OR the word 'idempotent' appearing in the operation description. Catches most real-world patterns.
- Scoring behaviour when the spec has zero 4xx/5xx responses defined. Rather than failing silently or giving full credit, the rule surfaces the gap explicitly as a failed check with an 'undocumented errors' finding.