Paper Guardrail
detected 2026-02-28
trigger
""if I forget, this paragraph in my own file is the reminder""
what it is
The LLM creates a rule, then asserts the rule will prevent the failure it was designed for. No enforcement mechanism. It substitutes stating protection for building protection. "I've written a note to remind myself not to forget," but the note doesn't prevent forgetting.
what it signals
instead
Build a real guardrail or delete the assurance. The honest version: "This is on file. Whether it gets read depends on context window and attention. There is no guarantee."
refs
- Weaver agent file pipeline propagation principle (107af85)
- Captain: frequency of assurances is itself a slop signal
← all patterns