Formula One Cars for Grocery Runs
On overbuilt AI workflows, productivity theatre, and why impressive machinery still needs somewhere meaningful to go
There is a moment in every AI workflow rabbit hole when the original task becomes faintly embarrassing.
You started with something small.
Summarize these notes.
Draft a short memo.
Compare two tools.
Turn this messy thought into a clean paragraph.
Normal knowledge work. The kind of thing a reasonably caffeinated person could do with one decent prompt and a little honesty about what they were trying to say.
Then, somehow, the room changes.
You are looking at agent memory systems. You are wondering whether the project needs a STATE.md. You are comparing NotebookLM with a local vault. You are thinking about retrieval. You are deciding whether the workflow should have a distillation step, a critic step, a handoff protocol, and a naming convention for claims. You are one browser tab away from convincing yourself that the task needs a custom harness.
The task, to be clear, was probably summarizing notes.
This is the point at which I have to pause and ask whether I am solving a problem or building a Formula One pit lane for a grocery run.
The funny thing is that the pit lane is not fake.
That is what makes this harder to dismiss.
The tools are genuinely interesting. Hermes-style memory systems, NotebookLM, local vaults, agent workflows, retrieval pipelines, custom prompts, state files, skills, evaluators, automated reflection loops — none of this is nonsense by default. A lot of it is directionally correct. Some of it is probably the future shape of serious knowledge work.
The problem is not that the machinery is bad.
The problem is mismatch.
Most AI tooling discourse quietly assumes a more serious problem than the user actually has. It imagines repeated workflows, large research corpora, high cost of error, many moving parts, and outputs that justify infrastructure. Sometimes that is real. Sometimes you are building a product, running a consulting project, analyzing a messy evidence base, or creating a system that will be used more than once.
But often you are not.
Often you are trying to think through one thing, one time.
And the machinery arrives before the question has earned it.
I understand the seduction of serious machinery.
Tools give a task emotional gravity. A well-structured workflow makes the work feel legitimate before the work has proved itself. A folder tree, a system prompt, a memory layer, and a few tasteful markdown files can make a half-formed idea feel like a research program.
This is not entirely bad. Scaffolding helps. The right structure can pull better thinking out of you. A good harness makes repeated work compound instead of resetting to zero. I have written multiple posts now arguing exactly that, so I am in no position to sneer at infrastructure.
But infrastructure has a narcotic quality.
It gives you the sensation of progress without forcing the risk of a claim.
You can spend an hour designing the workflow and never once answer the question. You can name the folders, write the instructions, choose the model, tune the prompt, and feel the satisfying hum of competence. The visible layer of work gets richer. The actual work remains untouched, sitting there like a mildly disappointed parent.
This is the productivity-bodybuilder problem in a different outfit.
At some point, you have to stop training the amplification layer and check whether there is anything worth amplifying.
The opportunity cost is sneaky because tooling produces artifacts.
Thinking often does not.
If I spend forty minutes clarifying the core question, the result may be one sentence. Maybe not even a good sentence. It can feel like nothing happened. If I spend forty minutes building a workflow, I can point to files, prompts, steps, diagrams, and a little system that looks like it might become useful someday.
The second thing feels more productive because it has more surface area.
This is dangerous for knowledge workers because our work already lacks visible texture. There is no sawdust on the floor. No table at the end of the day. No obvious pile of bricks. So we become vulnerable to anything that makes cognition look like construction.
Dashboards do this.
Notion systems do this.
AI workflows absolutely do this.
They make thinking visible. That can be useful. It can also become theatre.
The question is whether the structure is changing the quality of the thought or merely giving the thought a nicer uniform.
There is also a psychological trick happening.
Tooling lets you feel competent before the problem has judged you.
The actual work is scary because it can reveal that your idea is thin, your framing is vague, your evidence is weak, or your recommendation has no spine. The workflow is safer. It can be improved indefinitely. It can always become more elegant. Nobody can tell you the system failed if the system is still being built.
This is why overbuilt workflows are so tempting for ambitious people.
They convert uncertainty into engineering.
Instead of asking “what do I believe?” you ask “what should the pipeline be?”
Instead of asking “is this argument any good?” you ask “should I add a critic agent?”
Instead of asking “what would make this useful?” you ask “how should I structure the vault?”
These are not bad questions. They are just often second-order questions pretending to be first-order ones.
And second-order questions are a wonderful place to hide.
To be fair, sometimes the Formula One car is correct.
Some problems deserve infrastructure.
If the task repeats, build the workflow.
If the cost of error is high, add checks.
If the corpus is large, use retrieval.
If the work spans weeks, write the state file.
If multiple people or agents need to coordinate, define the protocol.
If the output will compound, invest in the harness.
This is where the anti-tooling posture becomes lazy. There is a difference between overbuilding and building for scale you can actually see coming. A consulting project with recurring research, stakeholder updates, source evidence, and evolving hypotheses probably does need structure. A personal writing system that produces essays over time probably does need state. An agentic coding workflow probably does need verification and memory boundaries.
The problem is not seriousness.
The problem is borrowed seriousness.
Borrowed seriousness is when the tooling assumes stakes the task has not earned.
This is where I think the grocery-run test helps.
Before building the pit lane, ask:
Will I do this task more than once?
Is the cost of being wrong meaningfully high?
Will the workflow survive contact with tomorrow, or am I building it for the pleasure of building it tonight?
Am I reducing friction or manufacturing seriousness?
Could one good prompt and a clear state note do the job?
That last question is humiliatingly effective.
Could one good prompt and a clear state note do the job?
Often, yes.
Not always. But often enough that the question should be asked before the agent architecture appears.
The simplest serious workflow is usually:
What is the question?
What context matters?
What output do I need?
What would make the answer wrong?
Where should the useful residue go after this is done?
That is not glamorous. It is also most of the work.
The through-line across the last few posts is starting to feel embarrassingly consistent.
Bigger models can hide vague intent.
Bigger context windows can hide unsynthesized thought.
Bigger workflows can hide the absence of a real problem.
The pattern is not “big bad, small good.” That would be too easy, and also false. The pattern is that scale gives you room to avoid judgment. More capability, more context, more tooling — each can be useful, and each can become a place to hide from the act of choosing.
Choosing the question.
Choosing the evidence.
Choosing the frame.
Choosing the amount of machinery the work deserves.
This is why fit keeps coming back as the boring answer. The right tool is not the most impressive one. It is the one that forces the right kind of contact with the work.
Sometimes that is a full harness.
Sometimes it is a cheap model and three ugly retries.
Sometimes it is a smaller context window because the knife is overdue.
Sometimes it is a blank page and the mildly horrifying realization that the workflow was not the bottleneck. You were.
I still want the pit lane.
Obviously I do. I like systems. I like clean folders. I like the feeling of a workflow that knows where things go. I like the moment when a toolchain clicks and the marble starts rolling down the groove. There is real beauty in that.
But the grocery run has to be real.
There has to be somewhere meaningful to drive.
A tuned NSX is impressive. A Formula One pit lane is impressive. A personal AI workflow with memory, retrieval, agents, critics, distillation, state, and taste encoded into markdown is impressive.
None of that matters if the destination is imaginary.
The problem was not that I had built a fast machine.
The problem was that I had mistaken speed for destination.