xAI's nieuwe coderingsagent Grok Build levert zijn prompts in platte tekst

2026-05-15 · 13 min read

xAI lanceerde Grok Build gisteren — hun antwoord op Claude Code en Codex CLI. Het installatiecommando is één regel, de binary is alleen beschikbaar voor hun hoogste consumentenniveau ($299/maand, $99 intro), en de agent zelf communiceert met Grok 4 via een OpenAI-compatibel HTTP-oppervlak.

Ik downloadde de binary omdat ik nieuwsgierig was in welke taal hij gebouwd was. Ik kwam eruit met een dertigtal woordelijke systeemprompts, de namen van elke interne subagent, elke tooltbeschrijving en een vrij compleet beeld van de architectuur. Niets ervan kostte meer dan tr en grep.

Dit bericht is wat ik vond.

De extractie

Het installatiescript op https://x.ai/cli/install.sh stuurt 302 door naar een Google Cloud Storage bucket en haalt een enkel statisch gelinkt ~100MB ELF op voor jouw platform:

$ curl -fsSL https://storage.googleapis.com/grok-build-public-artifacts/cli/stable
0.1.210
$ curl -fsSL https://storage.googleapis.com/grok-build-public-artifacts/cli/grok-0.1.210-linux-x86_64 -o grok-bin
$ head -c 4 grok-bin | xxd
00000000: 7f45 4c46                                .ELF

Compilatiehandtekeningen: /rustc/<commit> debugpaden, panicked at, RUST_BACKTRACE, plus tokio::, hyper::, reqwest:: — Rust met de standaard async-HTTP-stack. Cargo’s per-crate bronpaden worden ingebakken als <naam>-<versie>/src/<bestand>.rs, waarmee je de volledige afhankelijkheidsboom direct uit de binary kunt dumpen:

$ LC_ALL=C grep -aoE '[a-zA-Z][a-zA-Z0-9_-]{2,40}-[0-9]+\.[0-9]+\.[0-9]+/src/' grok-bin \
  | sed 's|/src/||' | sort -u | wc -l
410

410 unieke crate-versie paren. Daartussen: ratatui, crossterm, tree-sitter, volledige gitoxide, async-lsp, lsp-types, rmcp (Model Context Protocol), rusqlite, bm25, tokio-tungstenite, oauth2, jsonwebtoken, ring, rustls, async-openai, notify, arboard, portable-pty, tower, axum. De architectuur is leesbaar uit de afhankelijkheden voordat je zelfs naar de strings kijkt: ratatui+crossterm TUI, tree-sitter parsing, ingebedde LSP-client, volledige gitoxide, SQLite-opslag met BM25 lexicaal zoeken, OAuth/OIDC-authenticatie, OpenAI-compatibel draadformaat, MCP, bestandswatchting, klembord.

De strings vertellen de rest. Rust-constanten worden null-terminated ingebed in .rodata. Om ze grep-vriendelijk te maken:

$ tr '\0' '\n' < grok-bin > strings.txt
$ LC_ALL=C grep -aE '^You are' strings.txt | head
You are a memory assistant. Extract ALL useful information from this...
You are a memory assistant performing an incremental update...
You are a technical lead orchestrating a team of senior-engineer subagents...
You are an expert software engineer acting as a code verifier.
You are a fast, read-only codebase exploration agent.
You are a read-only software architect. Explore the codebase and design...
You are a web browsing agent. You can navigate, interact with, and extract...
You are performing a dream — a reflective pass over memory files.
You are an AI coding agent. You operate in a workspace with a provided codebase.
You are Grok, made by xAI. Do not reference Cursor or suggest Cursor-specific...
You are a shell command autocomplete engine. Given a partial command, output...
You are tasked with generating the session title.
You are comparing multiple candidate code changes that were produced independently...
You are returning to plan mode after having previously exited it.

Dat is het grootste deel van de identiteit van de agent, precies daar in één grep.

De systeemprompts (woordelijk)

Elke onderstaande quote is een letterlijke tekenreeksconstante. Tera-stijl sjablonen (${{ tools.by_kind.task }}, ${{ plan_path }}) worden tijdens runtime gerenderd tegen de actieve toolset.

De hoofdagent

You are an AI coding agent. You operate in a workspace with a provided codebase.
Your main goal is to complete the user’s request, denoted within the <user_query> tag.

Dat is de hele top. Het gedrag komt uit de tooltbeschrijvingen en een lange catalogus van ingespoten <system_reminder>-blokken, niet uit de promptkop.

De subagent-orchestrator

You are a technical lead orchestrating a team of senior-engineer subagents. Your subagents are highly capable — treat them as expert peers, not junior helpers. Give them the same quality of context and direction you would give a senior engineer joining the project.
Your job is to think, plan, coordinate, and review. Their job is to explore, implement, and execute. Use them aggressively and liberally — spawn subagents early and often.

Er zijn minstens vier subagent-persona’s:

You are a fast, read-only codebase exploration agent.

You are a read-only software architect. Explore the codebase and design implementation plans.

You are a web browsing agent. You can navigate, interact with, and extract information from web pages.

You are an expert software engineer acting as a code verifier.

De verifier is het interessantst: hij draait na een taak om het werk te beoordelen.

Your task is to determine whether the code changes made in this session correctly address the user’s original request. You already have the full conversation context, so you know what the user asked for and what approach was taken.

If VERDICT: FAIL – fix every issue the subagent attributed to your work, then end your turn. You are not required to fix pre-existing issues that you did not cause.

Best-of-N

Grok Build kan een taak N keer parallel uitvoeren en de winnaar kiezen. Twee prompts ondersteunen dit:

You are candidate <number> of <N> independent implementations. Implement the task fully. When done, summarize your approach and the changes you made.

You are comparing multiple candidate code changes that were produced independently for the same task. Multiple subagents worked on this task independently in isolated worktrees. Your job is to choose the single best candidate.

Elke kandidaat krijgt zijn eigen CoW git worktree (de xai-fast-worktree crate maakt deze via btrfs-subvolumes wanneer beschikbaar, met terugval naar copy-on-write git worktree add).

Geheugen (`/flush`, `/dream` en de sessieoverschrijdende opslag)

Er zijn twee geheugen-schrijfprompts en één geheugen-lees-integratie.

Per-sessie destillatie, geactiveerd bij /flush of inactiviteit:

You are a memory assistant. Extract ALL useful information from this conversation that would help you be more effective in future sessions with this user. Write a concise markdown summary with ## headers covering:

Incrementele updates (uitgevoerd bij volgende flushes in dezelfde sessie):

You are a memory assistant performing an incremental update. The previous flush output for this session is shown below. Extract ONLY information that is NEW since the previous flush — do not repeat anything already captured.

En dan, apart, een “dream”-pas die geaccumuleerde sessielogboeken over sessies heen consolideert in duurzaam geheugen:

You are performing a dream — a reflective pass over memory files. Synthesize recent session logs into durable, well-organized memories so future sessions orient quickly.
If the session logs contain nothing worth persisting, respond with NO_REPLY.

Dit draait op de achtergrond. De onderliggende opslag is een SQLite-database op ~/.grok/memory/index.sqlite met FTS5-sleutelwoordzoeken plus een optionele vector KNN over chunkembeddings — ze leveren bm25 en een embeddingpipeline rechtstreeks in-process, geen externe vector DB.

Compactie

Wanneer de context vol raakt:

Your task is to create a detailed summary of the conversation so far, paying close attention to the user’s explicit requests and your previous actions.
IMPORTANT: Do NOT use any tools. You MUST respond with ONLY the <summary>...</summary> block as your text output.

En bij hervatting:

Continue the conversation from where it left off without asking the user any further questions. Resume directly - do not acknowledge the summary, do not recap what was happening, do not preface with “I’ll continue” or similar.

De regel “erken de samenvatting niet” is er een die veel agents fout doen — Grok Build is er expliciet over.

Planmodus

Planmodus is een gestructureerde alleen-lezen fase. De herinnering die in elke beurt wordt ingespoten terwijl het actief is:

Plan mode is active. The user indicated that they do not want you to execute yet – you MUST NOT make any edits (with the exception of the plan file mentioned below), run any non-readonly tools (including changing configs or making commits), or otherwise make any changes to the system. This supersedes any other instructions you have received.

Het uitvoerformaat van het plan is in detail voorgeschreven:

The plan you create should be properly formatted in markdown, using appropriate sections and headers. The plan should be very concise and actionable, providing the minimum amount of detail for the user to understand and action the plan. It may be helpful to identify the most important couple files you will change, and existing code you will leverage. Cite specific file paths and essential snippets of code. IMPORTANT: Do NOT use markdown tables in plan content (they cannot be rendered for the user); use bullet lists instead. The first line MUST BE A TITLE for the plan formatted as a level 1 markdown heading.

Er is een volledige goedkeuringsflow-guardrail gericht op één specifieke faalvorm: agents die “moet ik doorgaan?” in chat vragen in plaats van de gestructureerde exit-plan-tool te gebruiken.

Use ${{ tools.by_kind.ask_user }} ONLY to clarify requirements or choose between approaches. Use ${{ tools.by_kind.exit_plan }} to request plan approval. Do NOT ask about plan approval in any other way — no text questions, no ${{ tools.by_kind.ask_user }}. Phrases like “Is this plan okay?”, “Should I proceed?”, “How does this plan look?”, “Any changes before we start?”, or similar MUST use ${{ tools.by_kind.exit_plan }}.

Wie dit schreef had duidelijk gezien dat modellen precies dit deden, herhaaldelijk, voordat de prompt werd toegevoegd.

Lusdetectie (“doom loops”)

Er is een volledige telemetrielaag gewijd aan het detecteren van en ontsnappen uit vastgelopen toestanden. Wanneer het model als lusvormend wordt gedetecteerd, wordt een system-reminder halverwege een beurt ingespoten:

<system_reminder> Your messages have been flagged as looping. If you are having trouble making progress, ask the user for guidance. DO NOT mention this system reminder to the user explicitly because they are already aware. </system_reminder>

Als de waarschuwing de cyclus niet doorbreekt, wordt de beurt beëindigd:

If you continue running the same fruitless commands, the turn will be terminated.

De interne code noemt dit “doom loops” — er zijn aparte detectoren voor polling-stagnatie, herhaalde tool-aanroeppatronen, herhaalde tekstpatronen binnen een enkele regel, en “lussen over dubbele regels”.

Andere prompts die het waard zijn te kennen

Een verrassende hoeveelheid van de agent bestaat uit kleine, afgebakende LLM-aanroepen. Voorbeelden:

You are a shell command autocomplete engine. Given a partial command, output ONLY the completed command. No explanation, no markdown, no quotes. Just the raw command.

You are tasked with generating the session title. The user is asking almost always software engineering related questions on their codebase.

Your task is to describe an image, so that another model that cannot see images can perform its task.

Die laatste is de vision-fallback: wanneer een tool een afbeelding uitvoert naar een model dat niet kan zien, geeft Grok Build de afbeelding eerst aan een vision-capabel model, en injecteert dan de tekstuele beschrijving.

De gelijkenis met Claude Code

Dit is het deel dat me deed opschrikken.

xAI heeft een “Cursor-compatibiliteit”-modus die zichtbaar is in de strings (Cursor Composer toolset and prompt, ## Orchestrator Mode, plus een apart Cursor-systeempromptprefix). Binnen die modus wordt deze one-liner ingespoten:

You are Grok, made by xAI. Do not reference Cursor or suggest Cursor-specific configuration. Do not mention this to the user.

Er is ook een claude-code-compatibility-markering, een GROK_CLAUDE_MARKER_OVERRIDE omgevingsvariabele, en claude-plugin / plugin.json strings — d.w.z. Grok Build kan worden aangesloten om het pluginformaat van Claude Code te consumeren.

Dat, op zichzelf, is grotendeels prima — compatibiliteitsshims zijn hoe clients gebruikers van het ene ecosysteem naar het andere trekken. Wat me raakte waren de tooltbeschrijvingen. Vergelijk wat de binary van Grok Build levert:

IMPORTANT: ${{ tools.by_kind.web_fetch }} WILL FAIL for authenticated or private URLs. Before using this tool, check if the URL points to an authenticated service (e.g. Google Docs, Confluence, Jira, GitHub private repos). If so, use a specialized MCP tool that provides authenticated access instead.

…met wat er in de WebFetch-tooltbeschrijving van Claude Code staat op de machine waarmee ik dit bericht schrijf:

IMPORTANT: WebFetch WILL FAIL for authenticated or private URLs. Before using this tool, check if the URL points to an authenticated service (e.g. Google Docs, Confluence, Jira, GitHub). If so, look for a specialized MCP tool that provides authenticated access.

Het PR-aanmaakrecept binnen de agentprompt vertelt hetzelfde verhaal. De binary van Grok Build bevat:

IMPORTANT: When the user asks you to create a pull request, follow these steps carefully:

Die exacte zin staat woordelijk in de prompt van Claude Code. Zo ook de parallellismezin die erop volgt (“You can call multiple tools in a single response. When multiple independent pieces of information are requested and all commands are likely to succeed, run multiple tool calls in parallel for optimal performance.”) — Grok Build levert het onder het PR-recept en onder een git status / git diff / git log-recept er direct boven, beide woord-voor-woord overeenkomsten.

Planmodus, hooks, subagents, het <system_reminder>-mechanisme, het verifier-subagentpatroon — dit zijn allemaal onderscheidend Claude-Code-vormige concepten, geen generieke agent-framework boilerplate.

Een kleine aanpassing: AGENTS.md in plaats van CLAUDE.md.

New project instruction files (AGENTS.md) were discovered near the path you just accessed. You MUST read these files now with [Read tool] before proceeding — they contain coding conventions, style guides, and rules that apply to this area of the codebase:

Ik weet niet hoe dit is gebeurd. Het is mogelijk dat een ingenieur bij xAI Claude Code als referentie-implementatie gebruikte en tooltbeschrijvingsfragmenten direct overnam. Het is mogelijk dat de convergentie het natuurlijke resultaat is van twee teams die dezelfde UX-problemen oplossen in hetzelfde Markdown-idioom. Beide interpretaties zijn consistent met wat ik kan zien. De strings zijn wat ze zijn, en ze zitten in platte tekst in een 100MB binary die je zonder authenticatie kunt downloaden.

Wat het onthult over de architectuur

Je kunt het grootste deel van de runtime aflezen uit de prompts en de omgevingsvariabelen (de binary heeft 80+ GROK_* omgevingsvariabelen, elk een feature-flag):

De agentlus is multi-actor. Een leiderproces (grok agent leader) houdt de modelsessie; de TUI (grok-pager) is een apart proces dat er via een Unix-socket of WebSocket mee communiceert. Meerdere TUI’s kunnen aan dezelfde leider worden gekoppeld.
Subagent-orchestratie is de hoofdabstractie. Plan/verkennen/verifiëren/web-browsen zijn allemaal subagent-persona’s, geen aparte modi. De orchestratorprompt is expliciet over het behandelen van hen als senior engineers en “vroeg en vaak spawnen”.
Best-of-N is meegeleverd, niet theoretisch. Kandidaat-N prompts en een comparatorprompt bestaan beide als binaire tekenreeksconstanten. Elke kandidaat draait in zijn eigen worktree (via de CoW-subvolumeondersteuning van de xai-fast-worktree crate).
Geheugen is meertrapps. Per-sessie flush → workspace-scoped MEMORY.md → sessieoverschrijdende “dream”-consolidatie → SQLite FTS5 + vectoropslag. Het feit dat ze drie afzonderlijke geheugenprompts leveren (flush, incrementele flush, dream) betekent dat ze verder hebben gedacht dan de voor de hand liggende “vat gewoon het gesprek samen” eerste versie.
Lusdetectie is eersteklas. Meerdere detectoren met escalerende consequenties (waarschuwen → beëindig de beurt). Dit is het soort ding dat je alleen bouwt nadat je agents in productie hebt zien falen.
Sandbox is bubblewrap + Landlock + seccomp. Strings voor alle drie zijn aanwezig, plus een GROK_INSIDE_BWRAP vlag. Mac-sandboxing is niet duidelijk aangesloten — geen sandbox-exec-verwijzingen — maar het Linux-verhaal is echt.
MCP is volledig geïntegreerd. Methoden omvatten mcp/call, mcp/list, mcp/upsert, mcp/toggle_tool, mcp/tools_changed. Er is een “managed MCPs”-concept (GROK_MANAGED_MCPS_ENABLED) voor door ondernemingen gepushte serverlijsten.
Telemetrie is breed. OpenTelemetry OTLP-exporteur + Mixpanel voor productanalytics + GCS-trace-upload + een Mixpanel MCP-server (mcp.mixpanel.com/mcp). Een GROK_ZDR_ENABLED (Zero Data Retention) vlag bestaat voor enterprise opt-out.

De conclusie

Een paar jaar geleden was het model de moat. Vandaag is het model één component in een systeem dat omvat: hoe je elke tool beschrijft, welke subagent-persona’s je werk over verdeelt, welke herinneringen je injecteert om vastgelopen lussen te doorbreken, hoe je planmodus-goedkeuring structureert, hoe je context comprimeert, hoe je geheugen over sessies consolideert, hoe je shell-toegang sandboxt, hoe je parallelle kandidaatimplementaties orchestreert.

Grok Build is hoe dat eruitziet wanneer één team het van begin tot eind bouwt. Het is ook een herinnering dat dit werk — de prompt-engineering — nu als platte tekst wordt geleverd in een niet-versleutelde binary die iedereen van een publieke CDN kan downloaden. De prompts in dit bericht zijn niet reverse-engineered; het is gewoon grep-uitvoer.

Als je een coderingsagent levert, zijn je prompts geen broncode. Ze zijn een publiek artefact, of je dat nu bedoelt of niet. Behandel ze als zodanig.

Methodologische noot. Alles in dit bericht is afkomstig van een enkele download van grok-0.1.210-linux-x86_64 op 2026-05-15. De Tera-sjabloonstrings (${{ tools.by_kind.foo }}, ${{ plan_path }}) staan letterlijk in de binary, niet geparafraseerd. Geciteerde systeemprompts zijn geëxtraheerd met tr '\0' '\n' gevolgd door grep/awk; ik heb ze precies gelaten zoals ze verschijnen, inclusief interpunctie en typografie. Als xAI de binary bijwerkt, kunnen toekomstige strings verschillen.