// pinned

I spent three months evaluating self-hosted LLM options for a healthcare client generating 6,000 clinical notes per day. Here's what I learned about Qwen, MLX, RunPod, and why the 'just use the API' crowd is often wrong.

infra · 8 min read read →
Why your RAG pipeline is lying to you (and how to fix it)

Retrieval quality is the bottleneck nobody talks about. Chunking strategies, embedding model choice, and the hybrid search trick that changed everything for our pipeline.

Build-vs-buy for clinical AI: a framework

A decision framework for health tech CTOs evaluating whether to build internal AI capabilities or integrate third-party solutions. Spoiler: it depends on your data moat.

Prompt engineering for clinical notes: first-person + self-audit

We tested four prompt architectures for generating clinical documentation. The winner was first-person voice with a self-audit JSON block — here's the structure and why it works.

MCP servers in production: lessons from six months

Model Context Protocol is powerful but the ecosystem is young. How I'm using MCP for HubSpot, Google Workspace, and custom data sources — and what broke along the way.

The case for function calling over chain-of-thought agents

Most teams reaching for autonomous agents should be using structured function calling instead. It's more predictable, cheaper, and easier to debug. Here's when each approach wins.