Building the Same System Three Times
This is Part 2 of Building the Same System Twice. If you haven’t read that, the short version: I built a carrier integration service (TypeScript, UPS API, OAuth 2.0) twice from the same spec. Once with outside-in TDD and 48 adversarial reviews over 18 hours. Once with a parallel agent swarm in 1 hour.
Then I built it a third time.
Why a third time?
The first two implementations answered “what does iterative review find that upfront design misses, and vice versa?” The third implementation asks a different question: what happens when the adversarial review runs before merge, not after?
Pidgeon ran 48 reviews after the code was already integrated. Pidgeon-swarm ran one review at the end. The third implementation — pidgeon-swarm-sortie-test — runs Sortie at every merge boundary. Four parallel workers build on separate branches. Each merge triggers a 3-model review (Claude, Gemini, Codex). Critical or major convergent findings block the merge until remediated. Max 2 remediation cycles.
Same spec. Same domain. New process variable: pre-merge adversarial gating.
The setup
Four workers with non-overlapping file ownership:
- Worker A: Domain types, validation, config
- Worker B: UPS auth + HTTP client
- Worker C: UPS mapper + carrier implementation
- Worker D: Service facade, registry, integration
Each worker builds on its own branch. When it’s ready to merge, Sortie runs the pipeline: 3 models review the diff in parallel, a 4th-model debrief synthesises findings with convergence analysis, triage evaluates severity against the configured blocking rules. The worker can’t merge until the verdict passes.
The git history shows this structure clearly:
merge: worker-a/domain-types
merge: worker-b/ups-auth-client
merge: worker-c/ups-mapper-carrier
merge: worker-d/service-registry
15 commits total. 99 tests, 229 assertions.
What pre-merge review caught
The Sortie review cycles found two things that the post-hoc approach in pidgeon missed entirely across 48 review documents:
HTTPS URL validation on UPS endpoints. The config loader in pidgeon (and pidgeon-swarm) accepts any URL string for the UPS rating and token endpoints. Someone could configure http:// and send OAuth credentials over plaintext. The Sortie review flagged this as a security issue. The fix: validate that configured URLs use HTTPS on known UPS domains.
NaN guards in monetary parsing. UPS returns monetary values as strings. The mapper calls parseFloat() without checking the result. If UPS returns a malformed price string, NaN propagates silently into the rate quote — the caller sees a price of NaN instead of a structured error. The Sortie review caught this. The fix: NaN guard after every parseFloat() with a structured error if parsing fails.
Neither of these crashes anything. Neither would fail a test suite that uses well-formed fixtures. Both are the kind of silent correctness issue that surfaces in production when real data arrives in unexpected shapes. And neither was caught by pidgeon’s 48 post-hoc adversarial reviews.
Why post-hoc missed them
This isn’t a knock on post-hoc review. Pidgeon’s reviews found and fixed 8 tagged issues (F1-F8) including structural problems like the decorative Result type (D015) that changed the entire build order. Those were architectural findings that pre-merge gating wouldn’t have caught — the error boundary problem existed in the walking skeleton before any branch structure existed.
The difference is scope. Post-hoc review looks at the entire codebase and tends to find architectural and design issues. Pre-merge review looks at a single diff and tends to find implementation issues — the NaN guard, the HTTPS validation, the missing edge case. Different zoom levels catch different bugs.
The numbers
| Metric | pidgeon | pidgeon-swarm | pidgeon-swarm-sortie-test |
|---|---|---|---|
| Commits | 86 | 17 | 15 |
| Tests | 161 | 55 | 99 |
| Assertions | 412 | — | 229 |
| Impl LOC | ~1,454 | ~700 | ~850 |
| Test LOC | ~3,911 | ~600 | ~1,450 |
| Test:code ratio | 2.7:1 | 0.86:1 | 1.7:1 |
| Review method | 48 post-hoc reviews | 1 code review | Pre-merge Sortie (3 models per merge) |
| Time | ~18 hours | ~1 hour | ~2 hours |
What converged across all three
All three projects independently arrived at the same core decisions:
- Bun runtime with native
bun:test - Zod as single source of truth for validation + type inference
- Native
fetch— no external HTTP client - Carrier plugin pattern: interface + registry + service facade
- Domain boundary: UPS shapes never leak to callers
- In-memory token cache with 60-second refresh buffer
- Environment-based config with Zod validation
- Named exports with explicit return types
When three different build processes converge on the same choices, those choices are probably correct for the domain.
What diverged
| Decision | pidgeon | swarm variants |
|---|---|---|
| Error boundary | Result<T> (never throws) |
Thrown CarrierError |
| Package structure | Monorepo (hard enforcement) | Flat src/ (convention) |
| Retry strategy | Exponential backoff middleware | 401 retry only |
| Carrier capabilities | Optional interface methods | Single getRates |
| TypeScript strictness | Maximum (3 extra flags) | Standard strict |
Pidgeon’s monorepo is the only one that enforces the dependency direction at the toolchain level. The flat projects rely on discipline — a domain file could import from carriers/ups/ without any build failure.
The real finding
Neither approach dominates. Each optimises for something different:
| Project | Optimises for |
|---|---|
| pidgeon | Longevity — future developers, extensibility, documented decisions |
| pidgeon-swarm | Clarity — minimal, readable, quick to understand |
| pidgeon-swarm-sortie-test | Process — demonstrates swarm + adversarial review workflow |
The ideal workflow combines pidgeon’s scope and documentation with Sortie’s pre-merge quality gating. Start with the walking skeleton and outside-in TDD to get the architecture right. Then use parallel workers with pre-merge review to fill in the implementation. Post-hoc review for architecture. Pre-merge review for correctness. Different tools for different zoom levels.
The insight isn’t “pre-merge is better than post-hoc.” It’s that review timing determines what you find. Move the review earlier and you catch implementation bugs before they integrate. Move it later and you catch architectural issues with full context. The question is where to put the gates, not whether to have them.
Source code: pidgeon (iterative TDD) · pidgeon-swarm (parallel swarm) · sortie (the review tool) · comparative review (full analysis)