Work

Two strands: client work, business problems shipped as working systems, and the AI engineering that makes agent-assisted software safe enough to put in front of customers.

Client work, shipped as working systems

LoanSlam

My own fail-closed conversation engine for a regulated lending domain, built after a 30-day Loans by MAL contract, as my vision of how it should work. An untrusted LLM proposes; deterministic code decides; every turn is audit-traced. Prototypal, over synthetic data.

prototypal live ↗ details →

Becoming Diamond

Paid client build, production and customer-facing: a marketing site plus a gated member portal delivering a 30-day video course, with AI chat, Stripe membership, and a git-based CMS for non-technical editing. Next.js, React 19, TypeScript, Stripe, Decap CMS.

production live ↗ details →

STA, Swanage Traffic

Paid client build: a community traffic campaign and registration platform with a built-in admin workflow, built for public use and live deployment. Astro, React, Vercel.

live ↗ details →

AI engineering, the depth that makes it trustworthy

The Pit

Multi-agent AI evaluation platform. Structured contests between agent configurations with observable traces, scoring, failure tagging, and cost visibility.

building eval engine live ↗ repo ↗ details →

Sortie

Async adversarial multi-model code review system. Parallel LLM fan-out, debrief synthesis with convergence analysis, severity-gated merge blocking. Python.

repo ↗ details →

Halo

Agent/tool-layer infrastructure: CLI modules with isolated stores and NATS event sourcing, so humans and agents drive the same tools through the same surface. Python, Docker, Kubernetes.

personal infra repo ↗ details →