Who Is Agent Trace For?

2026-01-30 · 5 min read

Cursor released Agent Trace, an open spec for tracking which code in a repo was written by an LLM. It records the model, the tool, the conversation, and the exact line ranges — all appended to a JSONL file in your project.

The pitch: “As agents write more code, it’s important to understand what came from AI versus humans.”

I spent time reading the spec and the reference implementation. The engineering is solid — clean schema, thoughtful extensibility, good partner list (Amp, Amplitude, Cloudflare, Cognition, Google, Vercel). But I kept circling back to one question: what do you do with this data?

What It Captures

Every time an LLM edits a file, a hook fires and records a trace: which model, which tool, which lines, which conversation. The reference implementation handles events from both Cursor and Claude Code:

// From the reference hook — events flow in via stdin
appendTrace(createTrace("ai", input.file_path!, {
  model: input.model,
  rangePositions,
  transcript: input.transcript_path,
  metadata: { conversation_id: input.conversation_id, generation_id: input.generation_id },
}));

The spec defines four contributor types — human, ai, mixed, unknown — and supports line-level attribution with content hashes for tracking code that moves around. It’s vendor-neutral, VCS-agnostic, and extensible via namespaced metadata.

As a data format, it’s well-designed. The question is what it enables.

The Attribution Challenge

The spec models authorship as a classification. But LLM-assisted coding is a conversation. You describe what you want. The LLM generates something. You edit half of it, reject a function, ask for a revision, accept the second attempt, then manually fix an edge case. Later, another LLM refactors the whole block.

Who wrote that code? The boundary between human and LLM authorship is blurry, and it’s getting blurrier. Most real code will end up as mixed, and if nearly everything is mixed, the classification isn’t telling you much.

Line-level attribution also has a shelf life problem. A trace says “Claude wrote lines 10-50 at commit abc123.” Two commits later, someone reformats that block or extracts a function from it. The spec’s answer is to chain through git blame, and content hashes can help track code that moves around. But in workflows with rebases and squash-merges, the chain breaks. These are hard problems — the kind an early RFC should be surfacing.

The Missing Action

The spec explicitly disclaims the obvious uses: not for code ownership, not for quality assessment, not for training data provenance. It says “transparency.” But transparency is a means, not an end.

If the code passes review and tests, what changes because an LLM wrote it? If it’s buggy, you fix it regardless. The spec never connects attribution to a concrete action. Data goes in, but there’s no defined way to get answers out. That’s the gap — not the format, but the use case.

Where It Gets Interesting

Here’s what I think Agent Trace is actually pointed at, even if the spec doesn’t say it yet.

Behind every LLM-generated function are reasoning tokens, wrong turns, retries, context switches, and tool invocations. An agent doesn’t just produce code — it goes through a process to get there. It reads files, misunderstands an interface, backtracks, tries a different approach, runs a test, fixes the failure, and lands on a solution. That process is invisible in the final diff.

The hook already captures more than just attribution. Look at the input surface:

interface HookInput {
  hook_event_name: string;
  model?: string;
  transcript_path?: string | null;
  conversation_id?: string;
  generation_id?: string;
  session_id?: string;
  file_path?: string;
  edits?: FileEdit[];
  command?: string;
  duration?: number;
  is_background_agent?: boolean;
  composer_mode?: string;
  tool_name?: string;
  tool_input?: { file_path?: string; new_string?: string; old_string?: string; command?: string };
  tool_use_id?: string;
}

Model, session, conversation, tool, command, duration, whether it’s a background agent, the composer mode, every tool invocation with its input. This isn’t just attribution data. This is process data. And process data is where the real value lives:

Evaluations. Which models struggle with which patterns? A function that took one shot is different from one that took twelve retries — even if the output is identical. Not “who wrote it” but “how hard was it to produce.”
Agent-native codebases. If agents consistently struggle with a module — lots of wrong turns, high retry rates, repeated context confusion — that’s a signal the code isn’t structured for how agents work. You can refactor for clarity, better interfaces, more explicit contracts. The trace data becomes a map of where your codebase is hostile to LLM collaboration.
Process optimization. Which tool configurations produce better first-attempt code? Which prompting patterns reduce backtracking? You can’t answer these from line attribution. You need the journey, not the destination.

The Right Question

Agent Trace is early. It’s an RFC with real challenges around attribution modeling, range durability, and query semantics. But the instinct is right — as LLMs write more code, we need structured data about how that happens.

The most useful version of this spec might not be “what did the LLM write?” but “how did the LLM get there?” The line-level ledger is a starting point. The process trace — the reasoning, the iterations, the cost of arriving at a solution — is where this becomes something teams actually use to write better code with LLMs.

That’s the spec I’d want to build on.