The Goldilocks Conundrum for AI Enthusiasts
On model maximalism, capability theatre, and learning when not to call the CEO to decide lunch
There is a particular kind of AI person who cannot solve a normal problem without first turning it into a tooling decision.
I say this with tenderness, because I am describing a version of myself I meet more often than I would like.
The task is simple. Write an email. Summarize a document. Think through a point of view. Clean up a small script. Something ordinary. Something with the emotional stakes of choosing a sandwich.
And yet, ten minutes later, you are comparing model cards.
Is this a frontier-model task?
Do I need the larger context window?
Would this be better with an agent?
Should I run it locally?
Is the cheaper model enough, or am I about to produce spiritually inferior bullet points?
This is how a person loses a Tuesday to a problem that should have taken seventeen minutes. Not because the tools are bad. Because the existence of a more powerful tool quietly turns every task into a referendum on whether you are serious enough to use it.
The task was probably writing an email.
The AI internet has become very good at performing technical seriousness.
People benchmark like researchers and live like operators. We talk about SWE-bench, context windows, reasoning depth, tool latency, local inference, quantization, memory layers, agent harnesses, and whether some new model has finally crossed the invisible line between “interesting” and “I should reorganize my entire workflow around this by midnight.”
Then we use it to summarize a PDF.
This is not hypocrisy. It is status signaling under uncertainty.
Nobody really knows what the right setup is yet. The tools are changing too quickly, the workflows are too personal, and the success metrics are weirdly private. One person’s life-changing AI setup is another person’s elaborate procrastination machine with a dark mode.
So “more capable” becomes a safe proxy for “more serious.”
The largest model. The biggest context window. The most agentic stack. The longest system prompt. The workflow diagram that looks like it was designed by someone who has strong opinions about both productivity and anime.
Capability is legible. Usefulness is harder to prove.
If you tell people you solved a problem with the most powerful model available, they understand the performance. If you tell them you solved it with a small, fast, slightly dumb model and a clear prompt, the result may be better, but the story has less status in it.
This is the first trap: we optimize for the setup that makes us look like we are operating near the frontier, not necessarily the setup that changes the work.
The previous post was about cheap intelligence: the way abundance changes behavior before raw capability does.
This is the companion problem. Once intelligence is available at many levels of power, you have to choose how much of it to bring to a task.
That choice is harder than it sounds.
AI enthusiasts tend to compare models at the ceiling. Which one can solve the hardest coding task? Which one survives the trickiest eval? Which one has the deepest reasoning traces? Which one can ingest a small nation’s worth of PDFs and emerge with a memo?
Ceilings matter. I do not want to be annoying about this. There are real differences between models. There are tasks where the smartest thing you can access is the right thing to use. If I am doing ambiguous strategy work, high-stakes writing, complex debugging, or a synthesis problem with too many moving pieces, I want the good stuff. I want the model that can hold the room.
But most daily work is not ceiling work.
Most daily work is routing, shaping, sorting, drafting, classifying, rephrasing, checking, brainstorming, and moving from “I have a vague thing” to “I have a slightly less vague thing.” The bottleneck is not always intelligence. Often the bottleneck is clarity, context, taste, or whether you have admitted what the task actually is.
Using the maximum available model for every task feels powerful. Sometimes it is just avoidance with a subscription.
There is a line from the backlog that I keep laughing at because it is stupid and correct:
You do not call the CEO to decide lunch.
The CEO may be very smart. The CEO may understand tradeoffs, incentives, organizational context, and the strategic importance of not ordering something that makes everyone sleepy before a 3 PM meeting.
Still. Do not call the CEO to decide lunch.
This is the model-selection problem in miniature. Some tasks need raw intelligence. Many need fit. A tiny local model may be enough for renaming files, classifying notes, or rewriting boilerplate. A cheap fast model may be better for messy ideation than a slower brilliant one because it keeps you in motion. A frontier model may be worth it when ambiguity is high and the cost of being subtly wrong is real.
The useful question is not:
What is the best model?
It is:
What is the smallest model that lets me stay in motion without lowering the ceiling of the work?
That second question is less glamorous. It also requires more self-knowledge.
You have to know what stage of thought you are in. Exploration? Use something cheap and fast. Drafting? Use something that matches the voice and can iterate cleanly. Deep synthesis? Bring in more power. Adversarial testing? Maybe the model matters less than the role you assign it. Mechanical cleanup? Please do not summon a digital archangel to alphabetize your notes.
The mature move is matching capability to the job.
The immature move is treating every job as a capability contest.
Why do we keep doing it?
Part of it is simple frontier anxiety. Nobody wants to be the person using the weaker tool after the stronger one has arrived. AI makes this worse because the frontier is not annual or even monthly anymore. It is ambient. Every week some model, wrapper, harness, agent, memory layer, or cursed acronym arrives and makes your current setup feel faintly obsolete.
Part of it is blame management. If an output is bad, it is comforting to believe the model was not powerful enough. That explanation protects your intent. The alternative is worse: maybe the question was vague. Maybe the context was sloppy. Maybe you wanted the model to discover a point of view you had not done the work to form.
Model maximalism can become a way to outsource responsibility for clarity.
Infinite capability often compensates for finite clarity.
I dislike that sentence because it implicates me too neatly.
There is also status in proximity to power. This is not unique to AI. People like driving fast cars slowly. They like owning cameras whose menus they do not understand. They like having tools that imply a version of themselves with more discipline, more taste, more interesting problems.
AI adds a new flavor because the tool is not merely expensive or complex. It feels intelligent. Using the bigger model lets you feel adjacent to the frontier of cognition itself, even if the task is making your meeting notes less embarrassing.
That is a very seductive little self-image.
The funny thing is that smaller tools often make you better.
Not always. Do not turn this into minimalism as moral theatre. But constraints do real work.
A smaller context window forces you to decide what matters. A cheaper model forces you to make the prompt clearer. A faster model encourages iteration. A local model makes privacy and ownership feel less abstract. A less capable model exposes when you were relying on raw horsepower to cover for a badly specified task.
This is why “good enough” is not a concession. It can be a discipline.
The danger of too much capability is that it lets you remain vague for longer. You can throw more context into the window. You can ask for more variants. You can let the model infer the structure you did not articulate. And because the output is fluent, it may take a while to notice that the underlying thought never got sharper.
The model did not fail. It protected you from the consequences of your own ambiguity.
That is a service, but not always a kindness.
Sometimes the smaller tool is better because it refuses to hide the mess.
This is where I think the Goldilocks idea matters.
Not maximalist.
Not ascetic.
Goldilocks.
Enough intelligence to be useful. Enough constraint to force clarity. Enough cost to respect the tool, but not so much that curiosity gets rationed. Enough context to avoid amnesia, but not so much that your working memory becomes a landfill. Enough automation to remove drudgery, but not so much that you lose contact with the judgment being automated.
The right amount of AI is not a philosophical constant. It changes by task, by mood, by stage, by stakes.
There are days when I want the strongest model I can get because the problem is genuinely hard and I can feel my own cognition running out of room. There are days when I want cheap intelligence because I need to stay playful. There are days when I want no AI because the friction is doing something. There are days when the best tool is a notebook, and the worst thing I could do is turn a half-formed thought into a polished paragraph too early.
The adult move is not choosing smaller tools to prove some artisanal point.
It is knowing which kind of help would actually help.
There is an economics version of this and a psychology version.
The economics version says: use cheaper models for cheaper tasks, expensive models for expensive tasks, and optimize the stack like a sensible person.
Fine. Useful. Boring.
The psychology version is the one I care about more:
What does this tool make me become while using it?
Does it make me exploratory or precious?
Does it make me clearer or more dependent on inference?
Does it make me faster in a way that keeps me honest, or faster in a way that lets me skip the hard part?
Does it make me feel serious, or does it help me do serious work?
That last distinction is where a lot of AI tooling discourse quietly lives. Feeling serious is easy. Open the most capable model, write a giant prompt, attach five documents, invoke an agent, watch the terminal scroll, and enjoy the satisfying sensation that work is happening.
Maybe it is.
But maybe you called the CEO to decide lunch.
The optimal AI setup is rarely the most powerful one. It is the setup that changes your behavior in the direction the work needs.
Sometimes that means more power. Sometimes it means cheaper retries. Sometimes it means smaller context. Sometimes it means turning the whole thing off and typing the sentence yourself because the act of typing is where the thought becomes real.
The annoying answer is fit.
The useful answer is fit.
The Goldilocks answer is fit.
AI enthusiasts, myself very much included, do not love this because fit is less exciting than capability. It does not screenshot well. It does not produce a benchmark graph. It does not let you say you are using the best possible thing.
But it may be the thing that actually matters.
The adult move is not using the biggest model you can access.
It is knowing when not to.