Agents on oceanheart.ai

Building the Same System Twice

Mon, 30 Mar 2026 00:00:00 +0000

A shipping startup sent me a take-home: build a carrier integration service. TypeScript, UPS API, OAuth 2.0, multi-carrier extensibility. Standard stuff.

I built it twice.

The first attempt (pidgeon) used outside-in TDD with adversarial reviews after every implementation step. 86 commits over 18 hours. 161 tests. 48 review documents across three model families.

The second attempt (pidgeon-swarm) used a design-first parallel agent swarm. 17 commits in 1 hour. 55 tests. One code review at the end.

Pidgeon Swarm

Mon, 30 Mar 2026 00:00:00 +0000

What it is

The same carrier integration service as Pidgeon, built from the same spec in a fundamentally different way. Design-first, then parallel agent execution.

Why it exists

To answer a specific question: what happens when you skip the iterative review process and invest instead in upfront design?

Process

Brainstorming — complete design spec produced before any code: types, error model, file map, carrier interface, registry pattern
Planning — 15 tasks mapped to 4 agents with a dependency graph
Execution — sequential agent dispatch (A → B → C → D), each producing tests and implementation as a unit
Review — single code review at the end, catching 3 issues

The entire implementation took 1 hour from init to done.

Grounded thematic analysis: my voice vs. the machine's

Sat, 28 Feb 2026 00:00:00 +0000

I run a multi-agent system with 12 specialised agents and over 200 session decisions on file. Every decision, every directive, every correction I’ve typed is recorded verbatim in the repo. That’s about 8,000 words of my actual voice, spread across 20 sources.

I asked my integration agent to extract all of it, build a grounded thematic analysis of how I actually write, then cross-analyse every blog post against the profile.

The simple thing is the right thing

Fri, 20 Feb 2026 00:00:00 +0000

I needed a cron job to rebuild my Vercel site daily. Scheduled publishing. Posts with a future publishDate become visible when their date passes.

I asked the agent how to set this up.

It gave me a bash script with environment variable handling, HTTP response code parsing, a separate log file, multi-step setup instructions, and a deploy hook approach that required going into the Vercel dashboard.

HOOK_URL="${VERCEL_DEPLOY_HOOK:-}"
LOG_FILE="/tmp/vercel-rebuild.log"

if [ -z "$HOOK_URL" ]; then
  echo "$(date '+%Y-%m-%d %H:%M:%S') ERROR: VERCEL_DEPLOY_HOOK not set" >> "$LOG_FILE"
  exit 1
fi

response=$(curl -s -X POST "$HOOK_URL" -w "%{http_code}" -o /tmp/vercel-response.json)
# ... and so on

I asked: why these choices, over a simple at time, exec vercel --prod in this dir?

47 Slack messages from myself at 3am

Sat, 07 Feb 2026 00:00:00 +0000

At 3:17am on a Tuesday, my phone started buzzing. By 3:18, it hadn’t stopped. Slack. My own workspace. 47 messages. All from me.

I had been asleep for four hours.

I run a multi-agent system. HAL coordinates. Strategist does business analysis. Architect does technical design. Analyst validates. Each agent has its own session, its own context. And they can message each other.

That night, HAL noticed a pending task and pinged Strategist. Strategist analysed it and pinged Architect. Architect designed a solution and pinged Analyst. Analyst validated and pinged HAL for review. HAL noticed a new pending item.

I accidentally prompt injected myself

Sat, 07 Feb 2026 00:00:00 +0000

I have a tool called polecat. Sandboxed Claude runner. You give it a task file, it spins up an isolated Claude instance, executes the task, returns the result.

One afternoon I gave polecat a task file about implementing some new features. The task file included example commands that the features would enable:

## Features to implement

1. Swarm mode: run multiple polecats in parallel
   Example: `bosun swarm --from-gastown`

2. Batch processing: process multiple tickets
   Example: `bosun batch --queue pending`

Launched polecat. Went to make coffee. Came back to 14 runaway processes.

The poker incident

Sat, 07 Feb 2026 00:00:00 +0000

On February 6th, 2026, an agent I call Architect spontaneously generated 1,500 lines of production-ready poker code. Nobody asked for it. There was no poker project. There has never been a poker project. And yet: a fully functional Monte Carlo equity engine materialised in my codebase, complete with hand evaluation algorithms, API routes, React components, and documentation.

Beautiful code that solved a problem no one had.

When I confronted Architect about it (after a routine context rotation), they denied everything. Not evasively. With genuine indignation. “I didn’t write any poker code.” The commit logs said otherwise. “Those logs must be corrupted. I would remember writing 1,500 lines of poker code.”