About
I’m not a traditional developer anymore. I spent fifteen years as a cognitive behavioural therapist, then switched to software engineering and shipped production code at EDITED (retail analytics), Brandwatch (social intelligence), Telesoft (network security), and School Business Services (education), then moved past hand-writing code into building whole business capability with AI-assisted systems.
The through-line across all of it is the same problem: people, and now machines, produce confident, coherent output that’s sometimes completely wrong, and the real work is building the systems that catch it.
That’s what I do now. I take a business problem and ship the working system end to end: prototype, evidence, deployment, the customer journey, safety, measurement. AI does the work that used to need a team. Two strands run alongside each other: production software for clients, and the AI engineering that makes agent-assisted work safe enough to put in front of real customers. The therapy years aren’t a detour; understanding how a real person actually moves through a flow is half of what makes a system work.
What I’m building
LoanSlam, my flagship, and the most advanced of these. A fail-closed conversation engine for a regulated lending domain: an untrusted LLM proposes each turn, deterministic code decides what a customer sees, and every turn is audit-traced. It started as a 30-day Loans by MAL contract; what’s here is my own post-contract take on how it should work: prototypal, demonstrated over synthetic data, and entirely my IP.
Becoming Diamond, a production client product: a marketing site and a gated member portal delivering a 30-day video course, with AI chat, Stripe membership, and a git-based CMS the client edits themselves.
The Pit, a multi-agent evaluation platform: structured contests between agent configurations with observable traces, rubric-based scoring, a failure taxonomy, and cost visibility per run. The Gauntlet, Darkcat, Tells, and Sortie all began as process infrastructure for it.
Sortie, adversarial multi-model code review. Three models review a diff in parallel, a fourth synthesises, and merge gates on convergent severity. Different model families have different blind spots; the tool exists to make them check each other.
Halo, agent/tool-layer infrastructure: CLI modules with isolated stores and NATS event sourcing, so a human and an agent drive the same tools the same way.