Agent Harness: Why the Wrapper Matters More Than the Foundation Model

A perspective on what's actually driving the AI revolution

When this whole AI thing kicked off in late 2022 — the time when chatgpt.com crushed the internet — this arena has been developing ever since. And it's been happening so fast that almost every month we keep hearing new terms. These terms actually depict that AI is not only about probability and non-deterministic stuff, but it definitely has knobs and steering which specify the speed and direction of this AI Ship or rather AGI Ship.

Well apart from this, one reason I think the development has been so fast — because it's a self-building tech. AI knows how to code and it uses that (basically humans use that) to increase its efficiency. It learns to code better and better — hence the development we have been seeing in this tech clearly hasn't happened ever at this pace.

What Is an Agent Harness?

Now coming back to the point of Agent Harness. In most simple words it's the set of algorithms, logics, infra that supports a foundational model — and makes it pretty much usable. In other words, the thing which we were calling an "AI WRAPPER" — the wrap around those foundational models — is called its harness, hence Agent Harness. And it's the wrapper itself which makes the model usable, approachable and integrates it in our daily system.

Claude Code: Harness in the Wild

Claude Code is a Harness. All have access to Anthropic's latest model (apart from Mythos), but nobody could have built a thing like Claude Code. Recently their source code (or what people are claiming it to be) got leaked, and showed all the feedback loops, retry methods, indexing pipeline, context engineering, tool calling which goes behind Claude Code.

OpenClaw and the OS Integration Play

In fact Clawdbot/OpenClaw is an agent harness. Peter didn't have any foundational model — he just made Clawdbot run through Anthropic APIs. He cracked the wrap around the model, which could help LLMs to deeply integrate within your system/OS — and that stepped the whole game up.

What Counts as Harness

Context Engineering, Prompt Engineering, RAG, MCP, Tools, Skills all come under Agent Harness. These all make a foundational model workable. In fact, all the models surfacing are more or less same on Benchmark — what differs is who cracks the Harness. Anthropic did it with Claude Code, OpenAI is trying it with Codex. Unless a model with a big leap in metrics — like Claude Mythos — comes out, Agent Harness will be the King.