I turn business problems into working systems, fast, using AI as leverage.

I'm not a traditional developer anymore. I earned those stripes shipping production software, then moved past hand-writing code. Now I build whole business capability with AI-assisted systems: prototype, evidence, deployment, the customer journey, safety, measurement. A therapist for fifteen years before an engineer, so the human side gets the same rigour as the machine.

Selected work

all work →

My own fail-closed conversation engine for a regulated lending domain, built after a 30-day Loans by MAL contract, as my vision of how it should work. An untrusted LLM proposes; deterministic code decides; every turn is audit-traced. Prototypal, over synthetic data.

prototypal live ↗ details →

Multi-agent AI evaluation platform. Structured contests between agent configurations with observable traces, scoring, failure tagging, and cost visibility.

building eval engine live ↗ repo ↗ details →

Paid client build, production and customer-facing: a marketing site plus a gated member portal delivering a 30-day video course, with AI chat, Stripe membership, and a git-based CMS for non-technical editing. Next.js, React 19, TypeScript, Stripe, Decap CMS.

production live ↗ details →

Async adversarial multi-model code review system. Parallel LLM fan-out, debrief synthesis with convergence analysis, severity-gated merge blocking. Python.

Agent/tool-layer infrastructure: CLI modules with isolated stores and NATS event sourcing, so humans and agents drive the same tools through the same surface. Python, Docker, Kubernetes.

personal infra repo ↗ details →

How I work

I build the whole thing, not just the code. A prototype goes up fast, then has to earn its place: real evidence it works, deployed where people actually use it, with the customer journey and the safety boundary designed together, and measured, so I know it's holding. AI does most of the mechanical work; the architecture and the judgment calls are mine, and so are the small details that decide whether it holds up once real users touch it.

When the system is an LLM in a regulated domain, that judgment becomes architecture: the model proposes, deterministic code decides what a customer sees, and every decision is on the record, so "why did it say that?" always has an answer. Adversarial multi-model review, fail-closed validators, and per-turn audit traces aren't techniques for their own sake; they're what lets a client trust an AI in front of their customers.

The human side gets the same attention. Fifteen years as a therapist is why the parts of a system that meet someone in distress get designed deliberately, not bolted on afterwards: spotting vulnerability early, and handing off to a person instead of pushing an answer.