Il nuovo agente di codifica di xAI, Grok Build, include i suoi prompt in testo normale

2026-05-15 · 14 min read

xAI ha lanciato Grok Build ieri — la loro risposta a Claude Code e Codex CLI. Il comando di installazione è una singola riga, il binario è disponibile solo per il livello consumer più alto ($299/mese, $99 di introduzione), e l’agente stesso comunica con Grok 4 attraverso una superficie HTTP compatibile con OpenAI.

Ho scaricato il binario perché ero curioso di sapere in quale linguaggio fosse stato costruito. Sono uscito dall’esperienza con una trentina di system prompt letterali, i nomi di ogni subagente interno, ogni descrizione di strumento e un’immagine abbastanza completa dell’architettura. Niente di tutto ciò ha richiesto più di tr e grep.

Questo post è ciò che ho trovato.

L’estrazione

L’installer su https://x.ai/cli/install.sh reindirizza 302 a un bucket di Google Cloud Storage e scarica un singolo ELF collegato staticamente di ~100MB per la tua piattaforma:

$ curl -fsSL https://storage.googleapis.com/grok-build-public-artifacts/cli/stable
0.1.210
$ curl -fsSL https://storage.googleapis.com/grok-build-public-artifacts/cli/grok-0.1.210-linux-x86_64 -o grok-bin
$ head -c 4 grok-bin | xxd
00000000: 7f45 4c46                                .ELF

Firme del compilatore: percorsi di debug /rustc/<commit>, panicked at, RUST_BACKTRACE, più tokio::, hyper::, reqwest:: — Rust con lo stack HTTP asincrono standard. I percorsi sorgente per crate di Cargo vengono incorporati come <nome>-<versione>/src/<file>.rs, il che consente di estrarre l’intero albero delle dipendenze direttamente dal binario:

$ LC_ALL=C grep -aoE '[a-zA-Z][a-zA-Z0-9_-]{2,40}-[0-9]+\.[0-9]+\.[0-9]+/src/' grok-bin \
  | sed 's|/src/||' | sort -u | wc -l
410

410 coppie crate-versione uniche. Tra esse: ratatui, crossterm, tree-sitter, gitoxide completo, async-lsp, lsp-types, rmcp (Model Context Protocol), rusqlite, bm25, tokio-tungstenite, oauth2, jsonwebtoken, ring, rustls, async-openai, notify, arboard, portable-pty, tower, axum. L’architettura è leggibile dalle dipendenze prima ancora di guardare le stringhe: TUI ratatui+crossterm, parsing tree-sitter, client LSP incorporato, gitoxide completo, storage SQLite con ricerca lessicale BM25, autenticazione OAuth/OIDC, formato wire compatibile con OpenAI, MCP, file-watching, appunti.

Le stringhe raccontano il resto. Le costanti Rust vengono incorporate con terminazione null in .rodata. Per renderle compatibili con grep:

$ tr '\0' '\n' < grok-bin > strings.txt
$ LC_ALL=C grep -aE '^You are' strings.txt | head
You are a memory assistant. Extract ALL useful information from this...
You are a memory assistant performing an incremental update...
You are a technical lead orchestrating a team of senior-engineer subagents...
You are an expert software engineer acting as a code verifier.
You are a fast, read-only codebase exploration agent.
You are a read-only software architect. Explore the codebase and design...
You are a web browsing agent. You can navigate, interact with, and extract...
You are performing a dream — a reflective pass over memory files.
You are an AI coding agent. You operate in a workspace with a provided codebase.
You are Grok, made by xAI. Do not reference Cursor or suggest Cursor-specific...
You are a shell command autocomplete engine. Given a partial command, output...
You are tasked with generating the session title.
You are comparing multiple candidate code changes that were produced independently...
You are returning to plan mode after having previously exited it.

Questa è la maggior parte dell’identità dell’agente, proprio lì in un singolo grep.

I system prompt (letterali)

Ogni citazione qui sotto è una costante stringa letterale. I template in stile Tera (${{ tools.by_kind.task }}, ${{ plan_path }}) vengono renderizzati a runtime contro il set di strumenti attivo.

L’agente principale

You are an AI coding agent. You operate in a workspace with a provided codebase.
Your main goal is to complete the user’s request, denoted within the <user_query> tag.

Questa è tutta la parte superiore. Il comportamento viene dalle descrizioni degli strumenti e da un lungo catalogo di blocchi <system_reminder> iniettati, non dall’intestazione del prompt.

L’orchestratore di subagenti

You are a technical lead orchestrating a team of senior-engineer subagents. Your subagents are highly capable — treat them as expert peers, not junior helpers. Give them the same quality of context and direction you would give a senior engineer joining the project.
Your job is to think, plan, coordinate, and review. Their job is to explore, implement, and execute. Use them aggressively and liberally — spawn subagents early and often.

Ci sono almeno quattro persona di subagenti:

You are a fast, read-only codebase exploration agent.

You are a read-only software architect. Explore the codebase and design implementation plans.

You are a web browsing agent. You can navigate, interact with, and extract information from web pages.

You are an expert software engineer acting as a code verifier.

Il verificatore è il più interessante: viene eseguito dopo un’attività per valutare il lavoro.

Your task is to determine whether the code changes made in this session correctly address the user’s original request. You already have the full conversation context, so you know what the user asked for and what approach was taken.

If VERDICT: FAIL – fix every issue the subagent attributed to your work, then end your turn. You are not required to fix pre-existing issues that you did not cause.

Best-of-N

Grok Build può eseguire un’attività N volte in parallelo e scegliere il vincitore. Due prompt supportano questo:

You are candidate <number> of <N> independent implementations. Implement the task fully. When done, summarize your approach and the changes you made.

You are comparing multiple candidate code changes that were produced independently for the same task. Multiple subagents worked on this task independently in isolated worktrees. Your job is to choose the single best candidate.

Ogni candidato ottiene il proprio worktree git CoW (il crate xai-fast-worktree li crea tramite sottovolumi btrfs quando disponibili, con fallback a git worktree add copy-on-write).

Memoria (`/flush`, `/dream` e lo store inter-sessione)

Ci sono due prompt di scrittura della memoria e un’integrazione di lettura della memoria.

Distillazione per sessione, attivata a /flush o in inattività:

You are a memory assistant. Extract ALL useful information from this conversation that would help you be more effective in future sessions with this user. Write a concise markdown summary with ## headers covering:

Aggiornamenti incrementali (eseguiti sui flush successivi nella stessa sessione):

You are a memory assistant performing an incremental update. The previous flush output for this session is shown below. Extract ONLY information that is NEW since the previous flush — do not repeat anything already captured.

E poi, separatamente, un passaggio “dream” che consolida i log di sessione accumulati tra le sessioni in memoria duratura:

You are performing a dream — a reflective pass over memory files. Synthesize recent session logs into durable, well-organized memories so future sessions orient quickly.
If the session logs contain nothing worth persisting, respond with NO_REPLY.

Questo viene eseguito in background. Lo store sottostante è un database SQLite in ~/.grok/memory/index.sqlite con ricerca per parole chiave FTS5 più un KNN vettoriale opzionale sugli embedding dei chunk — includono bm25 e una pipeline di embedding direttamente in-process, nessun DB vettoriale esterno.

Compattazione

Quando il contesto si riempie:

Your task is to create a detailed summary of the conversation so far, paying close attention to the user’s explicit requests and your previous actions.
IMPORTANT: Do NOT use any tools. You MUST respond with ONLY the <summary>...</summary> block as your text output.

E alla ripresa:

Continue the conversation from where it left off without asking the user any further questions. Resume directly - do not acknowledge the summary, do not recap what was happening, do not preface with “I’ll continue” or similar.

La regola “non riconoscere il riepilogo” è una che molti agenti sbagliano — Grok Build è esplicito al riguardo.

Modalità piano

La modalità piano è una fase strutturata di sola lettura. Il promemoria iniettato in ogni turno mentre è attiva:

Plan mode is active. The user indicated that they do not want you to execute yet – you MUST NOT make any edits (with the exception of the plan file mentioned below), run any non-readonly tools (including changing configs or making commits), or otherwise make any changes to the system. This supersedes any other instructions you have received.

Il formato di output del piano è prescritto in dettaglio:

The plan you create should be properly formatted in markdown, using appropriate sections and headers. The plan should be very concise and actionable, providing the minimum amount of detail for the user to understand and action the plan. It may be helpful to identify the most important couple files you will change, and existing code you will leverage. Cite specific file paths and essential snippets of code. IMPORTANT: Do NOT use markdown tables in plan content (they cannot be rendered for the user); use bullet lists instead. The first line MUST BE A TITLE for the plan formatted as a level 1 markdown heading.

C’è un intero guardrail del flusso di approvazione mirato a un specifico modo di fallimento: agenti che chiedono “devo procedere?” nella chat invece di usare lo strumento strutturato exit-plan.

Use ${{ tools.by_kind.ask_user }} ONLY to clarify requirements or choose between approaches. Use ${{ tools.by_kind.exit_plan }} to request plan approval. Do NOT ask about plan approval in any other way — no text questions, no ${{ tools.by_kind.ask_user }}. Phrases like “Is this plan okay?”, “Should I proceed?”, “How does this plan look?”, “Any changes before we start?”, or similar MUST use ${{ tools.by_kind.exit_plan }}.

Chi ha scritto questo aveva chiaramente visto i modelli fare esattamente questo, ripetutamente, prima di aggiungere il prompt.

Rilevamento dei loop (“doom loop”)

C’è un intero livello di telemetria dedicato al rilevamento e all’uscita dagli stati bloccati. Quando il modello viene rilevato in loop, un system-reminder viene iniettato a metà turno:

<system_reminder> Your messages have been flagged as looping. If you are having trouble making progress, ask the user for guidance. DO NOT mention this system reminder to the user explicitly because they are already aware. </system_reminder>

Se l’avviso non rompe il ciclo, il turno viene terminato:

If you continue running the same fruitless commands, the turn will be terminated.

Il codice interno chiama questi “doom loop” — ci sono rilevatori separati per la stagnazione del polling, i pattern di chiamate agli strumenti ripetute, i pattern di testo ripetuto all’interno di una singola riga e “loop su righe duplicate”.

Altri prompt degni di nota

Una quantità sorprendente dell’agente sono piccole chiamate LLM circoscritte. Esempi:

You are a shell command autocomplete engine. Given a partial command, output ONLY the completed command. No explanation, no markdown, no quotes. Just the raw command.

You are tasked with generating the session title. The user is asking almost always software engineering related questions on their codebase.

Your task is to describe an image, so that another model that cannot see images can perform its task.

Quest’ultimo è il fallback visivo: quando uno strumento produce un’immagine per un modello che non può vederla, Grok Build passa l’immagine prima a un modello con capacità visiva, quindi inietta la descrizione testuale.

La somiglianza con Claude Code

Questa è la parte che mi ha fatto alzare in piedi.

xAI ha una modalità “compatibilità Cursor” visibile nelle stringhe (Cursor Composer toolset and prompt, ## Orchestrator Mode, più un prefisso separato del system prompt di Cursor). All’interno di quella modalità, viene iniettato questo one-liner:

You are Grok, made by xAI. Do not reference Cursor or suggest Cursor-specific configuration. Do not mention this to the user.

C’è anche un marker claude-code-compatibility, una variabile di ambiente GROK_CLAUDE_MARKER_OVERRIDE, e stringhe claude-plugin / plugin.json — cioè Grok Build può essere collegato per consumare il formato plugin di Claude Code.

Questo, di per sé, va abbastanza bene — gli shim di compatibilità sono come i client attirano gli utenti da un ecosistema all’altro. Ciò che mi ha colpito erano le descrizioni degli strumenti. Confronta ciò che il binario di Grok Build include:

IMPORTANT: ${{ tools.by_kind.web_fetch }} WILL FAIL for authenticated or private URLs. Before using this tool, check if the URL points to an authenticated service (e.g. Google Docs, Confluence, Jira, GitHub private repos). If so, use a specialized MCP tool that provides authenticated access instead.

…con ciò che è nella descrizione dello strumento WebFetch di Claude Code sulla macchina su cui sto scrivendo questo post:

IMPORTANT: WebFetch WILL FAIL for authenticated or private URLs. Before using this tool, check if the URL points to an authenticated service (e.g. Google Docs, Confluence, Jira, GitHub). If so, look for a specialized MCP tool that provides authenticated access.

La ricetta di creazione PR all’interno del prompt dell’agente racconta la stessa storia. Il binario di Grok Build contiene:

IMPORTANT: When the user asks you to create a pull request, follow these steps carefully:

Quella frase esatta è nel prompt di Claude Code letteralmente. Lo è anche la formulazione del parallelismo che segue (“You can call multiple tools in a single response. When multiple independent pieces of information are requested and all commands are likely to succeed, run multiple tool calls in parallel for optimal performance.”) — Grok Build lo include sotto la ricetta PR e sotto una ricetta git status / git diff / git log immediatamente sopra, entrambe corrispondenze parola per parola.

La modalità piano, gli hook, i subagenti, il meccanismo <system_reminder>, il pattern del subagente verificatore — questi sono tutti concetti dalla forma distintivamente Claude-Code, non boilerplate generico del framework agente.

Un piccolo adattamento: AGENTS.md invece di CLAUDE.md.

New project instruction files (AGENTS.md) were discovered near the path you just accessed. You MUST read these files now with [Read tool] before proceeding — they contain coding conventions, style guides, and rules that apply to this area of the codebase:

Non so come sia successo. È possibile che un ingegnere di xAI abbia usato Claude Code come implementazione di riferimento e abbia incorporato direttamente frammenti di descrizione degli strumenti. È possibile che la convergenza sia il risultato naturale di due team che risolvono gli stessi problemi UX nello stesso idioma Markdown. Entrambe le interpretazioni sono coerenti con ciò che posso vedere. Le stringhe sono quelle che sono, e si trovano in testo normale in un binario da 100MB che chiunque può scaricare senza autenticazione.

Cosa rivela sull’architettura

È possibile leggere la maggior parte del runtime dai prompt e dalle variabili di ambiente (il binario ha più di 80 variabili di ambiente GROK_*, ognuna un feature flag):

Il loop dell’agente è multi-attore. Un processo leader (grok agent leader) mantiene la sessione del modello; la TUI (grok-pager) è un processo separato che comunica con esso tramite un socket Unix o WebSocket. Più TUI possono connettersi allo stesso leader.
L’orchestrazione dei subagenti è l’astrazione principale. Piano/esplorazione/verifica/navigazione web sono tutte persona di subagenti, non modalità separate. Il prompt dell’orchestratore è esplicito nel trattarli come senior engineer e “spawnarli presto e spesso”.
Best-of-N è incluso, non teorico. I prompt candidato-N e un prompt comparatore esistono entrambi come costanti stringa binarie. Ogni candidato viene eseguito nel proprio worktree (tramite il supporto ai sottovolumi CoW del crate xai-fast-worktree).
La memoria è multi-livello. Flush per sessione → MEMORY.md con scope workspace → consolidamento “dream” inter-sessione → SQLite FTS5 + vector store. Il fatto che includano tre distinti prompt di memoria (flush, flush incrementale, dream) significa che hanno pensato oltre il primo approccio ovvio “riassumi semplicemente la conversazione”.
Il rilevamento dei loop è di prima classe. Più rilevatori con conseguenze escalate (avviso → terminare il turno). Questo è il tipo di cosa che si costruisce solo dopo aver visto gli agenti fallire in produzione.
La sandbox è bubblewrap + Landlock + seccomp. Le stringhe per tutti e tre sono presenti, più un flag GROK_INSIDE_BWRAP. La sandbox su Mac non è ovviamente collegata — nessun riferimento a sandbox-exec — ma la storia Linux è reale.
MCP è completamente integrato. I metodi includono mcp/call, mcp/list, mcp/upsert, mcp/toggle_tool, mcp/tools_changed. C’è un concetto di “MCP gestiti” (GROK_MANAGED_MCPS_ENABLED) per le liste di server inviate dall’azienda.
La telemetria è ampia. Esportatore OpenTelemetry OTLP + Mixpanel per analytics di prodotto + caricamento di tracce GCS + un server MCP di Mixpanel (mcp.mixpanel.com/mcp). Esiste un flag GROK_ZDR_ENABLED (Zero Data Retention) per l’opt-out aziendale.

La conclusione

Qualche anno fa, il modello era il vantaggio competitivo. Oggi il modello è un componente in un sistema che include: come descrivi ogni strumento, quali persona di subagenti distribuisci il lavoro, quali promemoria inietti per rompere i loop bloccati, come strutturi l’approvazione della modalità piano, come compatti il contesto, come consolidi la memoria tra le sessioni, come metti in sandbox l’accesso alla shell, come orchestri implementazioni candidate parallele.

Grok Build è come appare quando un team lo costruisce dall’inizio alla fine. È anche un promemoria che questo lavoro — il prompt engineering — viene ora distribuito come testo normale in un binario non cifrato che chiunque può scaricare da un CDN pubblico. I prompt in questo post non sono stati ottenuti con reverse engineering; sono semplicemente output di grep.

Se distribuisci un agente di codifica, i tuoi prompt non sono codice sorgente. Sono un artefatto pubblico, che tu lo intenda o meno. Trattali come tali.

Nota metodologica. Tutto in questo post proviene da un singolo download di grok-0.1.210-linux-x86_64 il 2026-05-15. Le stringhe template Tera (${{ tools.by_kind.foo }}, ${{ plan_path }}) sono nel binario alla lettera, non parafrasate. I system prompt citati sono stati estratti con tr '\0' '\n' seguito da grep/awk; li ho lasciati esattamente come appaiono, inclusa la punteggiatura e la tipografia. Se xAI aggiorna il binario, le stringhe future potrebbero differire.