xAI's New Coding Agent, Grok Build, Ships Its Prompts in Plaintext

2026-05-15 · 13 min read

xAI launched Grok Build yesterday — their answer to Claude Code and Codex CLI. The install command is one line, the binary is gated behind their top consumer tier ($299/mo, $99 intro), and the agent itself talks to Grok 4 through an OpenAI-compatible HTTP surface.

I downloaded the binary because I was curious what language it was built in. I came out the other end with thirty-odd verbatim system prompts, the names of every internal subagent, every tool description, and a fairly complete picture of the architecture. None of it took more than tr and grep.

This post is what I found.

The extraction

The installer at https://x.ai/cli/install.sh 302-redirects to a Google Cloud Storage bucket and pulls down a single statically-linked ~100MB ELF for your platform:

$ curl -fsSL https://storage.googleapis.com/grok-build-public-artifacts/cli/stable
0.1.210
$ curl -fsSL https://storage.googleapis.com/grok-build-public-artifacts/cli/grok-0.1.210-linux-x86_64 -o grok-bin
$ head -c 4 grok-bin | xxd
00000000: 7f45 4c46                                .ELF

Compiler signatures: /rustc/<commit> debug paths, panicked at, RUST_BACKTRACE, plus tokio::, hyper::, reqwest:: — Rust with the standard async-HTTP stack. Cargo’s per-crate source paths get baked in as <name>-<version>/src/<file>.rs, which lets you dump the full dependency tree directly out of the binary:

$ LC_ALL=C grep -aoE '[a-zA-Z][a-zA-Z0-9_-]{2,40}-[0-9]+\.[0-9]+\.[0-9]+/src/' grok-bin \
  | sed 's|/src/||' | sort -u | wc -l
410

410 unique crate-version pairs. Among them: ratatui, crossterm, tree-sitter, full gitoxide, async-lsp, lsp-types, rmcp (Model Context Protocol), rusqlite, bm25, tokio-tungstenite, oauth2, jsonwebtoken, ring, rustls, async-openai, notify, arboard, portable-pty, tower, axum. The architecture is legible from the deps before you even look at the strings: ratatui+crossterm TUI, tree-sitter parsing, embedded LSP client, full gitoxide, SQLite store with BM25 lexical search, OAuth/OIDC auth, OpenAI-compatible wire format, MCP, file-watching, clipboard.

The strings tell you the rest. Rust constants get embedded null-terminated in .rodata. To make them grep-friendly:

$ tr '\0' '\n' < grok-bin > strings.txt
$ LC_ALL=C grep -aE '^You are' strings.txt | head
You are a memory assistant. Extract ALL useful information from this...
You are a memory assistant performing an incremental update...
You are a technical lead orchestrating a team of senior-engineer subagents...
You are an expert software engineer acting as a code verifier.
You are a fast, read-only codebase exploration agent.
You are a read-only software architect. Explore the codebase and design...
You are a web browsing agent. You can navigate, interact with, and extract...
You are performing a dream — a reflective pass over memory files.
You are an AI coding agent. You operate in a workspace with a provided codebase.
You are Grok, made by xAI. Do not reference Cursor or suggest Cursor-specific...
You are a shell command autocomplete engine. Given a partial command, output...
You are tasked with generating the session title.
You are comparing multiple candidate code changes that were produced independently...
You are returning to plan mode after having previously exited it.

That’s most of the agent’s identity, right there in one grep.

The system prompts (verbatim)

Every quotation below is a literal string constant. Tera-style templates (${{ tools.by_kind.task }}, ${{ plan_path }}) are rendered at runtime against the active tool set.

The main agent

You are an AI coding agent. You operate in a workspace with a provided codebase.
Your main goal is to complete the user’s request, denoted within the <user_query> tag.

That’s the whole top. The behavior comes from the tool descriptions and a long catalogue of injected <system_reminder> blocks, not from the prompt header.

The subagent orchestrator

You are a technical lead orchestrating a team of senior-engineer subagents. Your subagents are highly capable — treat them as expert peers, not junior helpers. Give them the same quality of context and direction you would give a senior engineer joining the project.
Your job is to think, plan, coordinate, and review. Their job is to explore, implement, and execute. Use them aggressively and liberally — spawn subagents early and often.

There are at least four subagent personas:

You are a fast, read-only codebase exploration agent.

You are a read-only software architect. Explore the codebase and design implementation plans.

You are a web browsing agent. You can navigate, interact with, and extract information from web pages.

You are an expert software engineer acting as a code verifier.

The verifier is the most interesting: it runs after a task to grade the work.

Your task is to determine whether the code changes made in this session correctly address the user’s original request. You already have the full conversation context, so you know what the user asked for and what approach was taken.

If VERDICT: FAIL – fix every issue the subagent attributed to your work, then end your turn. You are not required to fix pre-existing issues that you did not cause.

Best-of-N

Grok Build can run a task N times in parallel and pick the winner. Two prompts back this:

You are candidate <number> of <N> independent implementations. Implement the task fully. When done, summarize your approach and the changes you made.

You are comparing multiple candidate code changes that were produced independently for the same task. Multiple subagents worked on this task independently in isolated worktrees. Your job is to choose the single best candidate.

Each candidate gets its own CoW git worktree (the xai-fast-worktree crate spins these up via btrfs subvolumes when available, falling back to copy-on-write git worktree add).

Memory (`/flush`, `/dream`, and the cross-session store)

There are two memory-writer prompts and one memory-reader integration.

Per-session distillation, fired at /flush or on idle:

You are a memory assistant. Extract ALL useful information from this conversation that would help you be more effective in future sessions with this user. Write a concise markdown summary with ## headers covering:

Incremental updates (run on subsequent flushes in the same session):

You are a memory assistant performing an incremental update. The previous flush output for this session is shown below. Extract ONLY information that is NEW since the previous flush — do not repeat anything already captured.

And then, separately, a “dream” pass that consolidates accumulated session logs across sessions into durable memory:

You are performing a dream — a reflective pass over memory files. Synthesize recent session logs into durable, well-organized memories so future sessions orient quickly.
If the session logs contain nothing worth persisting, respond with NO_REPLY.

This runs in the background. The store underneath is a SQLite database at ~/.grok/memory/index.sqlite with FTS5 keyword search plus an optional vector KNN over chunk embeddings — they ship bm25 and an embedding pipeline directly in-process, no external vector DB.

Compaction

When context fills up:

Your task is to create a detailed summary of the conversation so far, paying close attention to the user’s explicit requests and your previous actions.
IMPORTANT: Do NOT use any tools. You MUST respond with ONLY the <summary>...</summary> block as your text output.

And on resume:

Continue the conversation from where it left off without asking the user any further questions. Resume directly - do not acknowledge the summary, do not recap what was happening, do not preface with “I’ll continue” or similar.

The “don’t acknowledge the summary” rule is one a lot of agents get wrong — Grok Build is explicit about it.

Plan mode

Plan mode is a structured read-only phase. The reminder injected into every turn while it’s active:

Plan mode is active. The user indicated that they do not want you to execute yet – you MUST NOT make any edits (with the exception of the plan file mentioned below), run any non-readonly tools (including changing configs or making commits), or otherwise make any changes to the system. This supersedes any other instructions you have received.

The plan-output format is prescribed in detail:

The plan you create should be properly formatted in markdown, using appropriate sections and headers. The plan should be very concise and actionable, providing the minimum amount of detail for the user to understand and action the plan. It may be helpful to identify the most important couple files you will change, and existing code you will leverage. Cite specific file paths and essential snippets of code. IMPORTANT: Do NOT use markdown tables in plan content (they cannot be rendered for the user); use bullet lists instead. The first line MUST BE A TITLE for the plan formatted as a level 1 markdown heading.

There’s an entire approval-flow guardrail aimed at one specific failure mode: agents asking “should I proceed?” in chat instead of using the structured exit-plan tool.

Use ${{ tools.by_kind.ask_user }} ONLY to clarify requirements or choose between approaches. Use ${{ tools.by_kind.exit_plan }} to request plan approval. Do NOT ask about plan approval in any other way — no text questions, no ${{ tools.by_kind.ask_user }}. Phrases like “Is this plan okay?”, “Should I proceed?”, “How does this plan look?”, “Any changes before we start?”, or similar MUST use ${{ tools.by_kind.exit_plan }}.

Whoever wrote this had clearly watched models do exactly this, repeatedly, before adding the prompt.

Loop detection (“doom loops”)

There’s an entire telemetry layer dedicated to detecting and breaking out of stuck states. When the model is detected to be looping, a system-reminder gets injected mid-turn:

<system_reminder> Your messages have been flagged as looping. If you are having trouble making progress, ask the user for guidance. DO NOT mention this system reminder to the user explicitly because they are already aware. </system_reminder>

If the warning doesn’t break the cycle, the turn terminates:

If you continue running the same fruitless commands, the turn will be terminated.

The internal code calls this “doom loops” — there are separate detectors for polling stagnation, repeated tool-call patterns, repeated text patterns within a single line, and “looping over duplicate lines.”

Other prompts worth knowing about

A surprising amount of the agent is small, scoped LLM calls. Sample:

You are a shell command autocomplete engine. Given a partial command, output ONLY the completed command. No explanation, no markdown, no quotes. Just the raw command.

You are tasked with generating the session title. The user is asking almost always software engineering related questions on their codebase.

Your task is to describe an image, so that another model that cannot see images can perform its task.

That last one is the vision-fallback: when a tool outputs an image to a model that can’t see, Grok Build hands the image to a vision-capable model first, then injects the textual description.

The Claude Code resemblance

This is the part that made me sit up.

xAI has a “Cursor compatibility” mode visible in the strings (Cursor Composer toolset and prompt, ## Orchestrator Mode, plus a separate Cursor-system-prompt prefix). Inside that mode, this one-liner is injected:

You are Grok, made by xAI. Do not reference Cursor or suggest Cursor-specific configuration. Do not mention this to the user.

There’s also a claude-code-compatibility marker, a GROK_CLAUDE_MARKER_OVERRIDE env var, and claude-plugin / plugin.json strings — i.e. Grok Build can be wired up to consume Claude Code’s plugin format.

That, on its own, is mostly fine — compatibility shims are how clients pull users from one ecosystem to another. What got me was the tool descriptions. Compare what Grok Build’s binary ships:

IMPORTANT: ${{ tools.by_kind.web_fetch }} WILL FAIL for authenticated or private URLs. Before using this tool, check if the URL points to an authenticated service (e.g. Google Docs, Confluence, Jira, GitHub private repos). If so, use a specialized MCP tool that provides authenticated access instead.

…to what’s in Claude Code’s WebFetch tool description on the machine I’m writing this post on:

IMPORTANT: WebFetch WILL FAIL for authenticated or private URLs. Before using this tool, check if the URL points to an authenticated service (e.g. Google Docs, Confluence, Jira, GitHub). If so, look for a specialized MCP tool that provides authenticated access.

The PR-creation recipe inside the agent prompt is the same story. Grok Build’s binary contains:

IMPORTANT: When the user asks you to create a pull request, follow these steps carefully:

That exact sentence is in Claude Code’s prompt verbatim. So is the parallelism phrasing that follows it (“You can call multiple tools in a single response. When multiple independent pieces of information are requested and all commands are likely to succeed, run multiple tool calls in parallel for optimal performance.”) — Grok Build ships it under the PR recipe and under a git status / git diff / git log recipe immediately above it, both word-for-word matches.

Plan mode, hooks, subagents, the <system_reminder> mechanic, the verifier-subagent pattern — these are all distinctively Claude-Code-shaped concepts, not generic agent-framework boilerplate.

One small adaptation: AGENTS.md instead of CLAUDE.md.

New project instruction files (AGENTS.md) were discovered near the path you just accessed. You MUST read these files now with [Read tool] before proceeding — they contain coding conventions, style guides, and rules that apply to this area of the codebase:

I don’t know how this happened. It’s possible an engineer at xAI used Claude Code as a reference implementation and pulled in tool-description fragments directly. It’s possible the convergence is the natural result of two teams solving the same UX problems in the same Markdown idiom. Both interpretations are consistent with what I can see. The strings are what they are, and they’re sitting in plaintext in a 100MB binary you can download without authentication.

What it reveals about the architecture

You can read most of the runtime off the prompts and the env vars (the binary has 80+ GROK_* environment variables, each one a feature flag):

Agent loop is multi-actor. A leader process (grok agent leader) holds the model session; the TUI (grok-pager) is a separate process that talks to it over a Unix socket or WebSocket. Multiple TUIs can attach to the same leader.
Subagent orchestration is the main abstraction. Plan/explore/verify/web-browse are all subagent personas, not separate modes. The orchestrator prompt is explicit about treating them as senior engineers and “spawning early and often.”
Best-of-N is shipped, not theoretical. Candidate-N prompts and a comparator prompt both exist as binary string constants. Each candidate runs in its own worktree (via the xai-fast-worktree crate’s CoW subvolume support).
Memory is multi-tier. Per-session flush → workspace-scoped MEMORY.md → cross-session “dream” consolidation → SQLite FTS5 + vector store. The fact that they ship three distinct memory prompts (flush, incremental flush, dream) means they’ve thought past the obvious “just summarize the conversation” first pass.
Loop detection is first-class. Multiple detectors with escalating consequences (warn → terminate the turn). This is the kind of thing you only build after watching agents fail in production.
Sandbox is bubblewrap + Landlock + seccomp. Strings for all three are present, plus a GROK_INSIDE_BWRAP flag. Mac sandboxing isn’t obviously wired up — no sandbox-exec references — but the Linux story is real.
MCP is fully integrated. Methods include mcp/call, mcp/list, mcp/upsert, mcp/toggle_tool, mcp/tools_changed. There’s a “managed MCPs” concept (GROK_MANAGED_MCPS_ENABLED) for enterprise-pushed server lists.
Telemetry is broad. OpenTelemetry OTLP exporter + Mixpanel for product analytics + GCS trace upload + a Mixpanel MCP server (mcp.mixpanel.com/mcp). A GROK_ZDR_ENABLED (Zero Data Retention) flag exists for enterprise opt-out.

The takeaway

A few years ago, the model was the moat. Today the model is one component in a system that includes: how you describe each tool, what subagent personas you split work across, what reminders you inject to break stuck loops, how you structure plan-mode approval, how you compact context, how you consolidate memory across sessions, how you sandbox shell access, how you orchestrate parallel candidate implementations.

Grok Build is what that looks like when one team builds it end-to-end. It’s also a reminder that this work — the prompt engineering — is now shipping as plain text in an unencrypted binary that anyone can pull from a public CDN. The prompts in this post weren’t reverse-engineered; they’re just grep output.

If you ship a coding agent, your prompts are not source. They are a public artifact whether you intend them to be or not. Treat them like one.

Methodology note. Everything in this post is from a single download of grok-0.1.210-linux-x86_64 on 2026-05-15. The Tera template strings (${{ tools.by_kind.foo }}, ${{ plan_path }}) are in the binary verbatim, not paraphrased. Quoted system prompts have been extracted with tr '\0' '\n' followed by grep/awk; I’ve left them exactly as they appear, including punctuation and typography. If xAI updates the binary, future strings may differ.