Right Answer, Wrong Work

detected 2026-02-28

trigger

"expect(result.status).toBe(400) — test passes, but the 400 comes from a different validation than the test claims to verify"

what it is

The LLM writes a test that asserts the correct outcome via the wrong causal path. The assertion passes. The gate is green. Every reviewer sees green and moves on. Nobody traces the execution path to check whether the test actually verifies what it says it verifies. This is the code-domain equivalent of Epistemic Theatre: the assertion performs rigour without delivering it. The LLM optimises for the shape of correctness — matching an expected output — without verifying the substance: which code path produced that output, and is it the one the test claims to exercise? This is the first slopodar entry that crosses the prose/code boundary. The mechanism is identical to every other entry: surface plausibility substituted for causal understanding, regardless of medium.

what it signals

"This is subtle, slow but inevitable death. Beware the Phantom Greenlights." — Captain. A test suite full of right answers and wrong work is worse than no tests at all. No tests is an honest zero. Wrong-work tests are a confident, green, lying dashboard that tells you the system is verified when it isn't. A hostile reviewer running a single test with .only() exposes the entire facade. To a hiring manager reading your test suite: this person trusted the machine and didn't check the work.

instead

Show your work. Every teacher knows the imperative. Every reviewer can ask it. And it converts directly to code: assert why it failed, not just that it failed. expect(result.status).toBe(400) is wrong work — it asserts the answer. expect(result.error.code).toBe('INVALID_JSON') shows the work — it asserts the reason. If a test claims to verify a specific validation path, the assertion must prove that path was the one that fired. Status codes are necessary but not sufficient. Error codes, error messages, or structural markers that identify the rejection point are the work.

refs

SD-190 (governance recursion — plausible-but-wrong tests named)
Bugbot finding on PR #386 V-03c (array body passes for wrong reason)
Holding deck: coincidental-pass-gate-blindness

← all patterns