I focus on the parts of an AI product that aren't the model itself — the
surrounding system that makes it reliable, observable, and economical.
I work with teams either as a hands-on engineer or as an advisor reviewing
architecture and shipping with your team.
01
Agent system design
From single-purpose tool callers to multi-agent rooms with handoff and delegation.
- → Picking the right pattern — one agent + tools, planner+executor, multi-agent room, or ReAct loop
- → Designing tool calling that actually works — schemas, errors, retries, safety tiers
- → Memory and context strategy — sliding windows, summarization, RAG, structured memory
02
Model routing & cost / latency tuning
Stop overpaying. Get the cheapest model that hits your quality bar, and a fallback for when it doesn't.
- → Routing strategies across Claude, OpenAI, Ollama, and self-hosted models
- → Caching, prompt deduplication, and context-window economics
- → Latency budgets — streaming, parallel tool calls, speculative decoding where it helps
03
Tool calling & MCP integration
The part most agent projects get wrong. Schema design, execution sandboxing, and tier-based authorization.
- → MCP server design — what to expose, what to gate behind approval
- → Read / write / destructive tool tiers, audit logging, agent attribution
- → Tool result caching, parallel execution, error recovery patterns
04
Evaluation & observability
Agents that pass eyeball-tests in a notebook usually fail in prod. Build the feedback loop early.
- → Eval harness design — synthetic conversations, regression suites, scoring rubrics
- → Tracing — every tool call, every token, every decision the agent makes
- → Production telemetry — cost per request, drop-off, intervention rate
05
Self-hosted infrastructure
When you need data residency, cost ceilings, or just don't want to ship customer data to OpenAI.
- → Kubernetes (Talos, k3s) cluster design — auth, ingress, secrets, monitoring
- → GitOps workflows — Gitea, ArgoCD-style sync, agent-driven PR proposals
- → Self-hosted models via Ollama / vLLM, with OpenAI-compatible front doors