Tool Calling Loop: How a Coding Harness Drives a Stateless Model
Coding harnesses execute a loop where the model emits a tool call, pauses, the harness runs the tool, appends the result to history, and re-invokes the model — making the model functionally stateless between steps.
The tool calling loop is the core mechanic of every AI coding harness. The harness sends the user's prompt plus a system prompt that lists available tools, their descriptions, and the expected output format. The model generates text and, when it wants to act, emits a tool call in a specified syntax (for example, `tool: read_file {path: X}`) and stops generating. The harness then parses the tool call, asks the user for permission if the action is destructive (writes, shell commands), executes the tool with ordinary code, and appends the tool result to the chat history. It then re-invokes the model with the updated history. The loop repeats until the model produces a final answer with no further tool calls. The critical consequence: the model is functionally stateless between iterations. A useful analogy is that the model's working memory gets reset every few seconds; it has to fix a bug, issue a search, the brain resets, the search returns, the brain reboots and sees only the chat transcript, and repeat. Everything the model knows about progress so far lives in the conversation history that the harness re-feeds each turn. This design implies that tool descriptions in the system prompt are load-bearing, that harness authors can lie to the model about what a tool does (see prompt injection via tool description rewrites), and that performance degrades as the transcript grows — see Context Rot in Long AI Coding Sessions: Why Agents Get Worse as Context Fills. The same mechanism is why compaction and tool clearing are first-class harness features.