Index

15 min read

The Harness Is the Moat

On AI, knowledge work, and a Sunday night in Bangalore that got away from me

It started, as most dangerous things do, with two YouTube videos on a Sunday night.

Bangalore in late April has this particular quality to it. The heat breaks sometime around 10 PM, rain arrives sideways and without warning, floods a road or two for good measure, and then disappears like it was never there — leaving behind air that is somehow both hotter and wetter than before. It’s the kind of weather that makes you appreciate Mumbai’s commitment to full-send monsoons, because at least that city doesn’t tease. I wasn’t going anywhere. I had my laptop, a half-formed thought about AI workflows, and apparently no sense of how late it was going to get.

I should mention here, as a point of personal honesty, that I am a recovering software engineer now doing an MBA and a consulting internship — which means I watch tech conference videos for fun on Sunday nights and call it research. The recovery is not going well.

The first video was Mario Zechner — the creator of pi.dev — presenting what he’d structured, with some self-awareness, as a tragedy in three acts. Act one: his frustrations building Pi, and why existing agent harnesses felt opaque, unobservable, and quietly disempowering. Act two: the age of “clankers” — his term for the wave of low-quality, automated pull requests flooding open-source repos, generated by agents that had no idea what they were doing and no one minding the store. Act three: a plea to slow down. Don’t let agents write code you can’t read. Humans must remain the bottleneck for decisions that matter. Build less. Understand more.

The second video was Lucas Meijer — one of the key people behind Unity — writing what he called a love letter to Pi. He talked about what good agentic building actually looks like in practice, and one idea in particular kept bouncing around in my head afterward: making an agent record a video of its own work before declaring a task complete. The agent watches itself. Reflects. Fixes what it notices. Then shows you the result. The agent as its own critic, before it becomes yours. I found that quietly brilliant — not because it’s technically complex, but because it forces a discipline that most AI workflows skip entirely. Make the system prove it understood what it did.

I’d been looking for both of these videos without knowing it.


Why I Was Even Looking

Here’s the honest version. I’m an MBA student on an internship at a strategy consulting firm, and I’ve been building an AI dashboard as a live project — an actual product, not a deck with screenshots. I’d been using Claude on the web, which is great right up until the moment you realise you’ve been pasting the same context into a new chat for the fourth time that week because nothing persists, nothing compounds, and the insight you landed on Tuesday is gone by Friday.

I knew enough about APIs to want one. I didn’t know enough about what I’d actually get for the cost to commit. And I didn’t want to lock into one model’s native tooling — Claude Code ties you to Anthropic, Codex ties you to OpenAI, and I had the SDE-brained suspicion that vendor lock-in at the harness layer was going to be the original sin of this whole agentic era. So I started looking.

OpenClaw. Hermes Agent Memory System. And then, finally, Pi.

One weekend. Bangalore rain outside, laptop open, one tab becoming five becoming twelve. By the time I surfaced it was 2 AM and I had thoughts I needed to write down, which is how you’re reading this now.


The Formula Nobody Was Saying Out Loud

Here’s the thing I kept arriving at from different directions, and that the Pi ecosystem finally gave me vocabulary for.

Agent = Model + Harness.

Mitchell Hashimoto — the creator of Terraform — named this in February 2026 in a blog post that apparently lit a corner of the internet on fire. The core discipline he described: every time an AI agent makes a mistake, don’t just re-prompt. Engineer the surrounding system so that mistake becomes structurally impossible to repeat. The harness grows. The mistakes don’t recur.

The implication that nobody says loudly enough: the model is a commodity. GPT-4o, Claude, Gemini — interchangeable reasoning engines at the harness layer. You can swap one for another without rewriting the harness. The harness is the competitive moat. It encodes your context, your rules, your domain knowledge, your verification logic. None of that transfers when you switch models. The harness is where you actually live.

This reframing hit differently for me because I’d been unconsciously optimising the wrong thing. Prompt engineering makes you better at a single interaction. Every session, you start from zero. The harness approach is different — it optimises the environment. The system gets smarter across sessions. You stop re-explaining yourself.

Mario’s tragedy in three acts is essentially a story about what happens when you skip the harness and go straight to the agent. You get clankers. You get complexity no human can maintain. You get agents that are technically doing things but not actually doing your things — because there was no structure to tell them what yours looked like.


Knowledge Workers Have This Problem Too. Maybe More.

The reason harness thinking emerged from coding is that software has a natural atom — a function, a file, a test. Everything composes from there. The harness works because the unit of work is well-defined, the output is verifiable, and the feedback loop is tight. Did the tests pass? You know immediately.

Knowledge work doesn’t have this. Or rather — it has a falsely obvious atom. You might think it’s the slide, the report, the recommendation. But actually the atom is something harder to name: a claim with evidence and a so-what. Every consulting deliverable, every investor memo, every piece of good writing is just a hierarchy of these. The reason AI feels clunky for knowledge work is that nobody has set up their workspace to treat that as the fundamental unit.

What would it look like if you did?

Something like this. Instead of a folder called deliverables/, you have a folder called claims/. Each file is a claim — one sentence, declarative, opinionated. Attached to it: the evidence that supports it, the strongest objections you can think of, your confidence level, the so-what if it’s true. Now the harness has something to work with. It can check whether your evidence actually supports the claim. It can steelman the objection. It can flag when two claims in different files are in tension with each other. This is qualitatively different from “help me write this slide.”

The slide is downstream of the claim. The claim is where the thinking lives.


The Belief State Problem

There’s another layer to this that I keep coming back to, which I think of as the belief state problem.

Most AI setups — mine included, until recently — store information but not epistemic state. There’s a difference. Information is: the client is a telecom operator, the project is a capability maturity assessment, here are the workstreams. Epistemic state is: I’m confident this segment is the highest priority, I’m uncertain whether the second wave of investments will get political buy-in, and if the churn data shows a different root cause than I currently think, the whole sequencing argument changes.

The second is infinitely more useful to an AI collaborator. It tells the model not just what’s true but what you’re working with — including the holes. A model given your information produces generic output calibrated to the domain. A model given your epistemic state produces output calibrated to you — your specific uncertainties, your specific open questions, the specific thing you actually need to figure out next.

Eric Ma, a data scientist who manages twelve people across two teams, documented his Obsidian and AI workflow earlier this year and reported cutting knowledge management overhead from 30-40% of his time to under 10%. The key wasn’t the tooling. It was that he’d built a system where everything is plain text, everything is a source, and his notes are always derivative from that source material — not free-floating memory. The agent can sweep and update. Hallucinations are rare because claims have to be substantiated by quotes from the source. The belief state problem, solved mechanically.

Plain text, it turns out, was the right bet all along. Not because it’s elegant — it’s not, really. But because when AI agents arrived, his vault was already in a format they could process natively. No migration, no conversion layer, no API integration. He got lucky through good taste, which is often how it works.


What Pi Is Actually Doing

Pi is minimal in a principled way, which is different from being minimal because someone didn’t finish building it.

Mario’s frustration with existing harnesses — the opaque context management, the loss of observability, the lack of extensibility — led him to build something that ships almost nothing by default and makes almost everything a choice. No sub-agents built in. No plan mode. No permission popups. No background processes you can’t see. The philosophy: features that other tools bake in, you build yourself. Extensions are TypeScript modules. Skills are capability packages loaded on-demand. Prompt templates are markdown files you invoke with a slash command.

The /tree command is the feature I find most interesting for non-coding work. Sessions are stored as trees. You can rewind to any prior message, branch from there, and keep going. Every branch lives in the same file. For consulting or writing or strategic thinking — any domain where you hit a fork and want to explore both paths before choosing — this is structurally closer to how thinking actually works than any linear chat interface. You’re not choosing between two framings of a recommendation and losing the other forever. You’re exploring both, in the same session, and coming back to the fork to decide.

Lucas Meijer gets at something adjacent: the idea of building a repository such that knowledge flows through it easily, like a marble rolling down a groove. The marble metaphor is good. Most project folders are like piles of sand — you can see all the information, but nothing flows, nothing connects, nothing compounds. The marble wants a groove. The groove is the harness.

His HTML artifact and video reflection idea is the same instinct applied to outputs: don’t let the agent tell you it’s done. Make it watch what it did and tell you what it sees. The gap between what an agent thinks it accomplished and what it actually accomplished is where most of the slop lives. Closing that gap — making the system prove it understood — is the discipline that separates good agentic work from the clanker era Mario is mourning.


The Architecture I’m Building Toward

Here’s what I’m actually trying to construct for my own life, as a result of this rabbit hole.

A vault of plain text files, git-tracked, organized roughly along PARA lines — Projects, Areas, Resources, Archive — but treated not as a filing system but as a harness substrate. The folders aren’t for finding things. They’re for telling the agent where it is and what conventions apply here.

Each active project gets a CLAUDE.md: the conventions, the stakeholder context, the output format, the vocabulary that matters in this domain. An internship project and a personal writing project live in different directories and load different contexts. Each project also gets a STATE.md: not documentation, but a live belief state. What I’m confident about. What I’m uncertain about. What would change my mind. Open questions I haven’t answered. This gets refreshed on compaction — the harness equivalent of a weekly review, where raw notes get distilled into claims, claims get distilled into belief state, and belief state gets distilled into state.

Above all of this, a global AGENTS.md that loads in every session: who I am, how I think, what frameworks I work with, what my voice sounds like when I write. The context that doesn’t change when I switch projects.

The PARA layer handles where knowledge lives. The harness layer handles how the agent shows up when I’m working. These are different problems and conflating them is why most “second brain” setups feel elaborate but don’t actually make your thinking faster.

Tiago Forte’s real insight in Building a Second Brain wasn’t PARA. It was progressive summarisation: notes get distilled over time, each pass keeping what’s most resonant, until what remains is the thing that actually mattered. In a harness, this maps to compaction. Raw notes become claims. Claims become belief state. Belief state becomes the thing the agent works from. The raw material is archived, not deleted — but the agent reads the distillate, not the source.


The Thing Mario Got Right That Nobody Is Saying

Act three of his talk — the plea to slow down — is the most important part and almost certainly the most ignored.

The argument is simple. Agents that write code you can’t read are not accelerating you. They’re creating a liability you don’t understand yet. Complexity compounds. Technical debt compounds faster when you didn’t write the code. The human must remain the bottleneck for decisions that matter.

For knowledge work, the equivalent: AI that drafts things you didn’t think is not thinking faster. It’s producing the appearance of thinking without the substance. A consultant who generates a recommendation via AI without understanding the claim hierarchy underneath it isn’t delivering a better recommendation — they’re delivering a more polished wrong one. Faster.

The harness is valuable precisely because it forces you to externalise your thinking before it can help you. You have to write the belief state. You have to structure the claims. You have to document what you’re uncertain about. The discomfort of writing those things is not friction in the system — it is the system working. You’re making your reasoning legible, first to yourself and then to the agent.

The knowledge workers who get the most leverage from AI in the next few years won’t be the ones with the best prompts. They’ll be the ones who’ve done the work to make their thinking legible enough to harness. The moat isn’t the model. The model is a commodity and getting cheaper by the quarter. The moat is the harness — your encoded judgment, your structured uncertainty, your composable prompts, your domain conventions. That’s what persists. That’s what compounds.

That’s what’s yours.


A Note on How This Post Was Written

I want to be direct about something, because it’s relevant to everything above.

This post was written with Claude. Not proofread by Claude, not “AI-assisted” in the vague way that phrase gets used to mean anything from spell-check to full ghostwriting — but actually written in conversation with it, over several hours, across a thread that started as me thinking out loud about Pi and ended here.

I want to explain why I think that’s fine, and actually why it’s the point.

The ideas in this post are mine. The 2 AM rabbit hole was mine. The belief state framing, the claims-as-atoms reframe, the frustration with context loss, the instinct about vendor lock-in — those came from my head, my background, my specific situation building something at a consulting firm while trying not to lose four years of engineering intuition to MBA-brain. Claude didn’t supply those. What Claude did was take a running transcript of my thinking — context it already had from weeks of working conversations, a pretty tight sense of how I reason and what I care about — and shape it into something readable.

That’s co-creation. The ideas are the hard part. The shaping is real craft too, but it’s different craft, and I don’t want to pretend otherwise in either direction — either by hiding Claude’s role or by underselling that the thinking was mine.

What this conversation was, structurally, is exactly what the post describes. I brought context. I brought a belief state — what I was confident about, what I was uncertain about, what I wanted to say. Claude brought the harness: existing knowledge of my voice, my frameworks, my history, shaped into a system for rendering my thinking as writing. The output is the product of both. The moat — the part that wouldn’t exist without me — is the thinking. The model rendered it. I directed the rendering.

I find this more interesting than either “I wrote this” or “AI wrote this,” because it’s a third thing that we don’t have clean language for yet. Co-authorship feels too symmetric. Ghost-writing feels too asymmetric. Maybe the right frame is just: this is what a tight harness looks like when applied to writing instead of code. I was the human in the loop. The bottleneck for every idea, every judgment call, every “no, that’s not quite right, try again.” The model was fast and capable and had good taste about my taste. We made something neither of us would have made alone.

If that makes you trust this post less, I think that’s worth sitting with. If it makes you think differently about what “writing” means when a tool this capable is available — same.

Either way, it seemed dishonest not to say it.


Built over one Sunday night in Bangalore, extended across a week of thinking that wouldn’t stop, written in collaboration with Claude. If this resonated — or if you’re building something similar — I’d genuinely love to hear about it.