Build in PublicMay 7, 20266 min read

Karpathy Named It. Here's What Agentic Engineering Actually Looks Like.

Key Takeaway

Agentic engineering is not about the AI — it is about the coordination layer, the spec design, and the human judgment that catches what the agents get wrong.

Karpathy Named a New Discipline

At Sequoia Ascent 2026, Andrej Karpathy drew a line between two kinds of AI-assisted work.

"Vibe coding raises the floor. Agentic engineering raises the ceiling."

He defined agentic engineering as the discipline of coordinating fallible, stochastic agents while preserving correctness, security, and taste. The programmer, in this model, is no longer just a code writer. The programmer is an orchestrator.

When I read that, I thought: this is the first accurate description of what I have been doing for six months.

I did not know it was a discipline when I started. I knew I needed a system. I built it. Now Karpathy has named it. This post is a field report from inside the role he described.

Read his full talk at karpathy.bearblog.dev/sequoia-ascent-2026.

What "Coordinating Fallible Agents" Actually Means

Karpathy's framing has three parts: coordination, fallibility, and preservation of correctness. All three are real problems.

Coordination is the unglamorous part. 37 agents at LeanAI Studio do not share a conversation. They share state files. The CEO agent writes to a task board. Downstream agents read it, execute, and write logs. The system works because the handoffs are defined precisely. Each agent has a spec that tells it what to read, what to produce, and where to write its outputs.

Without a coordination layer, you do not have a fleet. You have a collection of isolated chatbot sessions.

Fallibility is the part nobody talks about enough. AI agents are wrong regularly. They misread a JSON file. They confuse a task ID. They hallucinate a metric. Agentic engineering is mostly the discipline of catching these failures before they propagate.

The way we catch failures at LeanAI Studio: every agent logs to a shared activity file. Every significant action goes to a digest inbox I review daily. Quality gates sit between pipeline stages. The LP Tester grades landing pages before any ad campaign launches. The Funnel Diagnostics agent checks campaign data before any pause decision gets made. Nothing moves to the next stage without a verification step.

This is not "set it and forget it." It is closer to being an air traffic controller. The planes fly themselves. Your job is to watch for conflicts and intervene when the system produces an outcome you did not intend.

Correctness is the hardest part. Not code correctness. Judgment correctness. Did the agent draw the right conclusion from the data? Did it apply the right rule for this situation?

The answer is sometimes no. And because the agent does not know it is wrong, it will confidently proceed. The only thing that catches this is a human who understands the system well enough to recognize when an output is plausible but wrong.

This is what Karpathy meant when he said: "You can outsource your thinking, but you can't outsource your understanding."

What the Spec Layer Looks Like

Every LeanAI Studio agent has a written spec. The spec is not a prompt. It is a role definition: what the agent does, what it reads, what it produces, how it escalates, and what it should never do.

The Marketing Campaigns Manager spec runs roughly 2,000 words. It tells the agent which tools to use, what constitutes a valid reason to pause a campaign, how to detect impression share issues, and exactly three conditions under which it is allowed to escalate to me.

Writing a spec is design work. You are deciding, in advance, how the agent handles every case you can anticipate. The cases you cannot anticipate become the failures you debug on the next run.

Getting this right takes iteration. Most agents at LeanAI Studio have had their specs rewritten at least three times. Each rewrite came from a failure: an agent that escalated too often, or not enough; one that interpreted a rule too broadly; one that produced correct-looking output from wrong assumptions.

The spec layer is what separates agentic engineering from prompt engineering. A prompt gets you a useful response. A spec gets you a repeatable process.

The Heartbeat Rhythm

Most agents at LeanAI Studio run on a heartbeat schedule. Daily, twice daily, or weekly, depending on the function. Each heartbeat follows a structured pattern: read directives, read the task board, execute the highest-priority pending work, log the outputs.

The CEO agent runs a morning heartbeat and an evening heartbeat. Between those, specialized agents run their own cycles. The system produces a stream of logs and digest entries that I review once a day.

The heartbeat model has one critical property: it makes the system auditable. I can look at any agent's log and understand exactly what it did, when, and why. When something goes wrong, the audit trail tells me which step failed and what the agent believed at the time.

This is the engineering part. Not the AI part. The coordination, the logging, the structured handoffs: these are systems design choices that have nothing to do with model capability. The better you design the structure, the more reliably the AI inside it performs.

What Stays Human

There are decisions I will not delegate, regardless of how capable the agents get. Kill decisions on bets. Approval of anything that goes out under my name for the first time. Any action that is irreversible and above a certain risk threshold.

Not because agents cannot perform those actions technically. Because the judgment required to make them well depends on understanding that only accumulates through doing the work. You have to have felt what a bad LP looks like, read enough cold outreach to know when a message is wrong, spent time studying the market to recognize when a keyword signal is misleading.

Agents can execute the routine version of any of those judgments. They cannot hold the accumulated context that tells you when the routine is wrong.

The division of labor at LeanAI Studio: agents handle execution and routine judgment. I handle the decisions where context matters more than process.

Why This Matters Now

Karpathy named the discipline in April 2026. In six months, the vocabulary will be everywhere. Companies will claim to practice agentic engineering the way they claimed to "do agile" in the 2010s: mostly as a label applied to whatever they were already doing.

The real discipline is harder than the label. It is writing specs, debugging agent failures, designing coordination layers, building quality gates, and maintaining the understanding that lets you catch what the agents get wrong.

The founders and builders who learn to do this well will operate at a fundamentally different level than those who treat agents as smarter autocomplete.

That is what Karpathy was pointing at. And it is what I am building, one heartbeat at a time.

Karpathy's Sequoia Ascent 2026 talk is on his blog: [karpathy.bearblog.dev/sequoia-ascent-2026](https://karpathy.bearblog.dev/sequoia-ascent-2026/).

← All posts