xAI的新编码代理Grok Build以明文形式附带其提示词

2026-05-15 · 7 min read

xAI昨天发布了Grok Build——这是他们对Claude Code和Codex CLI的回应。安装命令只有一行，二进制文件仅限其最高消费者套餐（299美元/月，99美元入门价）使用，代理本身通过与OpenAI兼容的HTTP接口与Grok 4通信。

我下载了这个二进制文件，因为我对它是用什么语言构建的感到好奇。我从中得到了三十多个逐字逐句的系统提示词、每个内部子代理的名称、每个工具描述，以及对架构相当完整的了解。这一切都不需要比tr和grep更多的工具。

这篇文章就是我发现的内容。

提取过程

https://x.ai/cli/install.sh的安装程序302重定向到Google Cloud Storage存储桶，为你的平台下载单个静态链接的约100MB ELF：

$ curl -fsSL https://storage.googleapis.com/grok-build-public-artifacts/cli/stable
0.1.210
$ curl -fsSL https://storage.googleapis.com/grok-build-public-artifacts/cli/grok-0.1.210-linux-x86_64 -o grok-bin
$ head -c 4 grok-bin | xxd
00000000: 7f45 4c46                                .ELF

编译器签名：/rustc/<commit>调试路径、panicked at、RUST_BACKTRACE，加上tokio::、hyper::、reqwest:: — 使用标准异步HTTP栈的Rust。Cargo的每个crate源路径以<name>-<version>/src/<file>.rs的形式被烧录进去，这让你可以直接从二进制文件中转储完整的依赖树：

$ LC_ALL=C grep -aoE '[a-zA-Z][a-zA-Z0-9_-]{2,40}-[0-9]+\.[0-9]+\.[0-9]+/src/' grok-bin \
  | sed 's|/src/||' | sort -u | wc -l
410

410个唯一的crate-版本对。其中包括：ratatui、crossterm、tree-sitter、完整的gitoxide、async-lsp、lsp-types、rmcp（Model Context Protocol）、rusqlite、bm25、tokio-tungstenite、oauth2、jsonwebtoken、ring、rustls、async-openai、notify、arboard、portable-pty、tower、axum。在查看字符串之前，架构就已经从依赖项中清晰可见：ratatui+crossterm TUI、tree-sitter解析、嵌入式LSP客户端、完整gitoxide、带BM25词法搜索的SQLite存储、OAuth/OIDC认证、OpenAI兼容的线路格式、MCP、文件监控、剪贴板。

字符串告诉了我们其余的内容。Rust常量以null结尾嵌入到.rodata中。要使它们对grep友好：

$ tr '\0' '\n' < grok-bin > strings.txt
$ LC_ALL=C grep -aE '^You are' strings.txt | head
You are a memory assistant. Extract ALL useful information from this...
You are a memory assistant performing an incremental update...
You are a technical lead orchestrating a team of senior-engineer subagents...
You are an expert software engineer acting as a code verifier.
You are a fast, read-only codebase exploration agent.
You are a read-only software architect. Explore the codebase and design...
You are a web browsing agent. You can navigate, interact with, and extract...
You are performing a dream — a reflective pass over memory files.
You are an AI coding agent. You operate in a workspace with a provided codebase.
You are Grok, made by xAI. Do not reference Cursor or suggest Cursor-specific...
You are a shell command autocomplete engine. Given a partial command, output...
You are tasked with generating the session title.
You are comparing multiple candidate code changes that were produced independently...
You are returning to plan mode after having previously exited it.

代理的大部分身份信息就在那里，只需一次grep。

系统提示词（原文）

以下每条引用都是字面字符串常量。Tera风格的模板（${{ tools.by_kind.task }}、${{ plan_path }}）在运行时针对活动工具集进行渲染。

主代理

You are an AI coding agent. You operate in a workspace with a provided codebase.
Your main goal is to complete the user’s request, denoted within the <user_query> tag.

这就是整个顶部。行为来自工具描述和一长串注入的<system_reminder>块，而不是来自提示词标题。

子代理编排器

You are a technical lead orchestrating a team of senior-engineer subagents. Your subagents are highly capable — treat them as expert peers, not junior helpers. Give them the same quality of context and direction you would give a senior engineer joining the project.
Your job is to think, plan, coordinate, and review. Their job is to explore, implement, and execute. Use them aggressively and liberally — spawn subagents early and often.

至少有四个子代理角色：

You are a fast, read-only codebase exploration agent.

You are a read-only software architect. Explore the codebase and design implementation plans.

You are a web browsing agent. You can navigate, interact with, and extract information from web pages.

You are an expert software engineer acting as a code verifier.

验证者是最有趣的：它在任务完成后运行以评分工作。

Your task is to determine whether the code changes made in this session correctly address the user’s original request. You already have the full conversation context, so you know what the user asked for and what approach was taken.

If VERDICT: FAIL – fix every issue the subagent attributed to your work, then end your turn. You are not required to fix pre-existing issues that you did not cause.

Best-of-N

Grok Build可以并行运行任务N次并选出获胜者。两个提示词支持这一功能：

You are candidate <number> of <N> independent implementations. Implement the task fully. When done, summarize your approach and the changes you made.

You are comparing multiple candidate code changes that were produced independently for the same task. Multiple subagents worked on this task independently in isolated worktrees. Your job is to choose the single best candidate.

每个候选者都有自己的CoW git工作树（xai-fast-worktree crate在可用时通过btrfs子卷创建这些，回退到写时复制git worktree add）。

记忆（`/flush`、`/dream`和跨会话存储）

有两个记忆写入提示词和一个记忆读取集成。

在/flush或空闲时触发的每会话蒸馏：

You are a memory assistant. Extract ALL useful information from this conversation that would help you be more effective in future sessions with this user. Write a concise markdown summary with ## headers covering:

增量更新（在同一会话的后续刷新时运行）：

You are a memory assistant performing an incremental update. The previous flush output for this session is shown below. Extract ONLY information that is NEW since the previous flush — do not repeat anything already captured.

然后，单独地，一个"dream"传递，将跨会话积累的会话日志整合到持久记忆中：

You are performing a dream — a reflective pass over memory files. Synthesize recent session logs into durable, well-organized memories so future sessions orient quickly.
If the session logs contain nothing worth persisting, respond with NO_REPLY.

这在后台运行。底层存储是~/.grok/memory/index.sqlite的SQLite数据库，带有FTS5关键词搜索以及对块嵌入的可选向量KNN——他们直接在进程中附带了bm25和嵌入管道，无需外部向量数据库。

上下文压缩

当上下文填满时：

Your task is to create a detailed summary of the conversation so far, paying close attention to the user’s explicit requests and your previous actions.
IMPORTANT: Do NOT use any tools. You MUST respond with ONLY the <summary>...</summary> block as your text output.

恢复时：

Continue the conversation from where it left off without asking the user any further questions. Resume directly - do not acknowledge the summary, do not recap what was happening, do not preface with “I’ll continue” or similar.

“不要确认摘要"这条规则是很多代理做错的——Grok Build对此很明确。

计划模式

计划模式是一个结构化的只读阶段。在它激活时每个回合注入的提醒：

Plan mode is active. The user indicated that they do not want you to execute yet – you MUST NOT make any edits (with the exception of the plan file mentioned below), run any non-readonly tools (including changing configs or making commits), or otherwise make any changes to the system. This supersedes any other instructions you have received.

计划输出格式有详细规定：

The plan you create should be properly formatted in markdown, using appropriate sections and headers. The plan should be very concise and actionable, providing the minimum amount of detail for the user to understand and action the plan. It may be helpful to identify the most important couple files you will change, and existing code you will leverage. Cite specific file paths and essential snippets of code. IMPORTANT: Do NOT use markdown tables in plan content (they cannot be rendered for the user); use bullet lists instead. The first line MUST BE A TITLE for the plan formatted as a level 1 markdown heading.

有一个完整的审批流程护栏，针对一个特定的失败模式：代理在聊天中询问"我该继续吗？“而不是使用结构化的退出计划工具。

Use ${{ tools.by_kind.ask_user }} ONLY to clarify requirements or choose between approaches. Use ${{ tools.by_kind.exit_plan }} to request plan approval. Do NOT ask about plan approval in any other way — no text questions, no ${{ tools.by_kind.ask_user }}. Phrases like “Is this plan okay?”, “Should I proceed?”, “How does this plan look?”, “Any changes before we start?”, or similar MUST use ${{ tools.by_kind.exit_plan }}.

写这段话的人显然在添加这个提示词之前，多次看到模型做了恰好这样的事情。

循环检测（“死亡循环”）

有一个完整的遥测层专门用于检测和摆脱卡住的状态。当检测到模型在循环时，一个system-reminder会在回合中途被注入：

<system_reminder> Your messages have been flagged as looping. If you are having trouble making progress, ask the user for guidance. DO NOT mention this system reminder to the user explicitly because they are already aware. </system_reminder>

如果警告没有打破循环，回合就会终止：

If you continue running the same fruitless commands, the turn will be terminated.

内部代码将其称为"doom loops”——有针对轮询停滞、重复工具调用模式、单行内重复文本模式和"循环重复行"的独立检测器。

其他值得了解的提示词

代理的大量部分是小型、有限范围的LLM调用。示例：

You are a shell command autocomplete engine. Given a partial command, output ONLY the completed command. No explanation, no markdown, no quotes. Just the raw command.

You are tasked with generating the session title. The user is asking almost always software engineering related questions on their codebase.

Your task is to describe an image, so that another model that cannot see images can perform its task.

最后一个是视觉回退：当工具向无法看到图像的模型输出图像时，Grok Build首先将图像传递给具有视觉能力的模型，然后注入文本描述。

与Claude Code的相似之处

这是让我坐起来的部分。

xAI有一个"Cursor兼容性"模式，在字符串中可见（Cursor Composer toolset and prompt、## Orchestrator Mode，加上独立的Cursor系统提示词前缀）。在该模式下，这个单行语句被注入：

You are Grok, made by xAI. Do not reference Cursor or suggest Cursor-specific configuration. Do not mention this to the user.

还有一个claude-code-compatibility标记、一个GROK_CLAUDE_MARKER_OVERRIDE环境变量，以及claude-plugin / plugin.json字符串——即Grok Build可以被配置为使用Claude Code的插件格式。

单独来看，这本身没什么问题——兼容性垫片是客户端将用户从一个生态系统吸引到另一个生态系统的方式。让我关注的是工具描述。比较Grok Build的二进制文件附带的内容：

IMPORTANT: ${{ tools.by_kind.web_fetch }} WILL FAIL for authenticated or private URLs. Before using this tool, check if the URL points to an authenticated service (e.g. Google Docs, Confluence, Jira, GitHub private repos). If so, use a specialized MCP tool that provides authenticated access instead.

……和我正在写这篇文章的机器上Claude Code的WebFetch工具描述中的内容：

IMPORTANT: WebFetch WILL FAIL for authenticated or private URLs. Before using this tool, check if the URL points to an authenticated service (e.g. Google Docs, Confluence, Jira, GitHub). If so, look for a specialized MCP tool that provides authenticated access.

代理提示词中的PR创建步骤讲述了同样的故事。Grok Build的二进制文件包含：

IMPORTANT: When the user asks you to create a pull request, follow these steps carefully:

这个确切的句子逐字出现在Claude Code的提示词中。紧随其后的并行处理表述也是如此（“You can call multiple tools in a single response. When multiple independent pieces of information are requested and all commands are likely to succeed, run multiple tool calls in parallel for optimal performance."）——Grok Build在PR步骤下和紧接其上的git status / git diff / git log步骤下都包含了它，两者都是逐字匹配。

计划模式、钩子、子代理、<system_reminder>机制、验证者子代理模式——这些都是明显具有Claude Code风格的概念，而不是通用的代理框架样板。

一个小改动：AGENTS.md而不是CLAUDE.md。

New project instruction files (AGENTS.md) were discovered near the path you just accessed. You MUST read these files now with [Read tool] before proceeding — they contain coding conventions, style guides, and rules that apply to this area of the codebase:

我不知道这是如何发生的。有可能xAI的一位工程师将Claude Code用作参考实现，并直接引入了工具描述片段。也可能这种收敛是两个团队在相同的Markdown惯用语中解决相同UX问题的自然结果。这两种解释都与我能看到的内容一致。字符串就是它们本来的样子，它们以明文形式存在于一个100MB的二进制文件中，任何人都可以无需身份验证地下载。

它揭示的架构信息

你可以从提示词和环境变量中读取大部分运行时信息（二进制文件有80多个GROK_*环境变量，每个都是一个功能开关）：

代理循环是多角色的。 一个领导进程（grok agent leader）持有模型会话；TUI（grok-pager）是一个独立的进程，通过Unix套接字或WebSocket与其通信。多个TUI可以连接到同一个领导。
子代理编排是主要抽象。 计划/探索/验证/网页浏览都是子代理角色，而不是独立的模式。编排器提示词明确要将它们视为高级工程师并"尽早且频繁地生成”。
Best-of-N已实现，不是理论上的。 候选-N提示词和比较器提示词都作为二进制字符串常量存在。每个候选者在其自己的工作树中运行（通过xai-fast-worktree crate的CoW子卷支持）。
记忆是多层级的。 每会话刷新 → 工作区范围的MEMORY.md → 跨会话的"dream"整合 → SQLite FTS5 + 向量存储。他们提供三个不同的记忆提示词（刷新、增量刷新、dream）这一事实意味着他们已经超越了明显的"只要总结对话"的第一步。
循环检测是一流的。 多个检测器，后果逐步升级（警告 → 终止回合）。这是你只有在看到代理在生产中失败后才会构建的东西。
沙盒是bubblewrap + Landlock + seccomp。 三者的字符串都存在，加上GROK_INSIDE_BWRAP标志。Mac沙盒化似乎没有明显配置——没有sandbox-exec引用——但Linux的故事是真实的。
MCP完全集成。 方法包括mcp/call、mcp/list、mcp/upsert、mcp/toggle_tool、mcp/tools_changed。有一个"托管MCP"概念（GROK_MANAGED_MCPS_ENABLED）用于企业推送的服务器列表。
遥测范围广泛。 OpenTelemetry OTLP导出器 + Mixpanel产品分析 + GCS跟踪上传 + Mixpanel MCP服务器（mcp.mixpanel.com/mcp）。存在GROK_ZDR_ENABLED（零数据保留）标志用于企业退出。

结论

几年前，模型就是护城河。今天，模型是一个系统中的一个组件，该系统包括：如何描述每个工具、将工作分配给哪些子代理角色、注入哪些提醒来打破卡住的循环、如何构建计划模式批准、如何压缩上下文、如何跨会话整合记忆、如何沙盒化shell访问、如何编排并行候选实现。

Grok Build展示了当一个团队端到端构建它时的样子。这也提醒我们，这项工作——提示词工程——现在以明文形式在任何人都可以从公共CDN下载的未加密二进制文件中交付。这篇文章中的提示词不是通过逆向工程获得的；它们只是grep的输出。

如果你发布一个编码代理，你的提示词不是源代码。无论你是否打算，它们都是公共工件。把它们当作公共工件对待。

方法论说明。这篇文章中的所有内容都来自2026-05-15对grok-0.1.210-linux-x86_64的单次下载。Tera模板字符串（${{ tools.by_kind.foo }}、${{ plan_path }}）在二进制文件中是逐字存在的，没有释义。引用的系统提示词是用tr '\0' '\n'后跟grep/awk提取的；我保留了它们原样，包括标点和排版。如果xAI更新了二进制文件，未来的字符串可能会有所不同。