RussellLuo

面向长程任务的Skills设计技巧

发表于 2026-03-08 分类于 AI

Skills是当前Agent领域最热门的设计范式之一，风头一时无两。这也让很多人形成了一种直觉：

只要Skill写得足够详细，Agent就能搞定复杂任务。

在简单任务中，这种想法通常没什么问题。但一旦面对长程任务（Long-horizon tasks），效果往往大打折扣，甚至根本无法完成。

那么问题就来了——当处理长程任务时，Skills应该如何设计？

长程任务为什么困难

在Agent系统中，长程任务是指那些需要持续几十分钟、数小时甚至几天时间才能完成的任务，例如：

自动化软件开发流程
深度技术调研
长链路业务流程执行

这类任务通常具有三个特点：

步骤数量多
执行时间长
上下文持续增长

从LLM的角度来看，本质问题在于：

任务执行过程中产生的token会不断累积，进而导致上下文膨胀，甚至会超出LLM的上下文限制。

有了Skills为什么还需要上下文工程

针对上述问题，Anthropic在Effective context engineering for AI agents中进行了深入探讨，并从上下文工程（Context engineering）的角度出发，提出了3种解决方案：

压缩（Compaction）：在上下文接近上限时，通过高保真摘要压缩历史对话并重启上下文窗口，以维持长程任务的连贯性。
Subagents（Sub-agent architectures）：将复杂任务拆分给多个拥有独立上下文的专用Subagent处理，并由主Agent进行高层规划与结果整合，从而突破上下文限制并提升复杂任务的处理能力。
结构化笔记（Structured note-taking）：让Agent定期将重要信息记录到上下文之外的持久存储中，并在需要时拉回，以保持跨复杂任务的关键上下文和依赖。

细心的读者可能已经注意到，上述文章发表于Skills推出之前，那么Skills是否已经解决了长程任务的问题呢？答案是否定的。

关于Skills，我们在谈谈Agent Skills的底层原理中探讨过：它的三层加载技术（亦即Progressive Disclosure），本质上是一种Context Offloading策略。但有一点需要补充的是，Progressive Disclosure的设计是为了“让通用Agent具备处理各种特定任务的能力，但无需一次性加载所有特定任务的知识”，而不是为了更好地处理某个长程任务。

因此，即使有了Skills，我们仍然需要结合上下文工程的方法来应对长程任务。对于上述3种方法，压缩需要Agent系统本身支持（Claude Code等工具已经支持）。本文主要从使用者的角度出发，探讨如何利用Subagents和结构化笔记来优化Skills的设计。

一个简化案例：开发流程Agent

假设我们希望构建一个Agent，用于自动化软件开发流程。

实际的开发流程通常会涉及众多环节，这里为了简化讨论，我们只考虑以下三个步骤：

Issue → Code：根据Issue生成代码
Code → Tests：为代码生成测试
Code → Review：进行代码审查

针对这个流程，我们可以写一个“大Skill”：

---
name: dev-workflow
description: Complete development workflow for handling issues.
---

You help developers with three workflow tasks:

1. Issue to Code: Generate code from issue description.
2. Code to Tests: Generate tests for the generated code.
3. Code to Review: Perform code review on the generated code.

如果在Claude Code中使用这个Skill，执行过程中的Context大致如下：

[System] You are Claude Code.
[User] Complete issue.md
[Tool Call] Read(.claude/skills/dev-workflow/SKILL.md)
[Tool Output] <the skill instructions>
[Tool Call] Read(issue.md)
[Tool Output] <issue description>]
[Assistant] <generated code>
[Tool Call] Write(code.py)
[Tool Output] <success>
[Assistant] <generated tests>
[Tool Call] Write(tests.py)
[Tool Output] <success>
[Tool Call] Bash(python tests.py)
[Tool Output] <test results>
[Assistant] <generated review report>
[Tool Call] Write(review.md)
[Tool Output] <success>
[Assistant] <final output>

可以看到，3个任务共享同一个LLM上下文，随着任务的推进，上下文会不断膨胀。在真实场景中，如果Issue的需求比较复杂，上下文膨胀问题会更加严重，甚至可能导致任务失败。

技巧一：使用1个主Agent和3个Subagent

利用Subagents的思路，一种策略是创建3个独立的Subagent，分别负责开发、测试和审查（参考Create custom subagents）。

开发Subagent（位于.claude/agents/code-developer.md）：

---
name: code-developer
description: Generate code from issue description.
---

You are a code developer. Your task is to generate code based on the provided issue description.

测试Subagent（位于.claude/agents/code-tester.md）：

---
name: code-tester
description: Generate tests for the provided code.
---

You are a code tester. Your task is to generate tests for the provided code.

审查Subagent（位于.claude/agents/code-reviewer.md）：

---
name: code-reviewer
description: Perform code review on the provided code.
---

You are a code reviewer. Your task is to perform code review on the provided code.

有了这3个Subagent之后，我们再创建一个Skill（位于.claude/skills/effective-dev-workflow/SKILL.md），指导主Agent调用这3个Subagent：

---
name: effective-dev-workflow
description: Complete development workflow for handling issues.
---

You are a project manager. Your task is to manage the development workflow for handling issues.

Here are the steps you need to follow:

1. Issue to Code: Use the `code-developer` agent to generate code from the issue description.
2. Code to Tests: Use the `code-tester` agent to generate tests for the generated code.
3. Code to Review: Use the `code-reviewer` agent to perform code review on the generated code.

同样地，我们可以在Claude Code中使用这个Skill，但执行过程中会产生4个独立的Context。

主Agent的Context：

[System] You are Claude Code.
[User] Complete issue.md
[Tool Call] Read(.claude/skills/effective-dev-workflow/SKILL.md)
[Tool Output] <the skill instructions>
[Tool Call] Read(issue.md)
[Tool Output] <issue description>
[Tool Call] Task(code-developer, "Generate code and save it as code.py")
[Tool Output] <final output from code-developer>
[Tool Call] Task(code-tester, "Generate tests for the code in code.py and save them as tests.py")
[Tool Output] <final output from code-tester>
[Tool Call] Task(code-reviewer, "Perform code review on the code in code.py and save the report as review.md")
[Tool Output] <final output from code-reviewer>
[Assistant] <final conclusion>

开发Subagent的Context：

[System] You are Claude Code.\nYou are a code developer...
[User] Generate code and save it as code.py
[Assistant] <generated code>
[Tool Call] Write(code.py)
[Tool Output] <success>
[Assistant] <final output>

测试Subagent的Context：

[System] You are Claude Code.\nYou are a code tester...
[User] Generate tests for the code in code.py and save them as tests.py
[Tool Call] Read(code.py)
[Tool Output] <code content>
[Assistant] <generated tests>
[Tool Call] Write(tests.py)
[Tool Output] <success>
[Tool Call] Bash(python tests.py)
[Tool Output] <test results>
[Assistant] <final output>

审查Subagent的Context：

[System] You are Claude Code.\nYou are a code reviewer...
[User] Perform code review on the code in code.py and save the report as review.md
[Tool Call] Read(code.py)
[Tool Output] <code content>
[Assistant] <generated review report>
[Tool Call] Write(review.md)
[Tool Output] <success>
[Assistant] <final output>

由上可见：

主Agent只关注高层次的任务调度，每个Subagent只返回关键结果，大大减少了主Agent的上下文负担。
同时，每个Subagent也都有自己独立的上下文，不会相互干扰，从而有效避免了上下文膨胀的问题。

技巧二：使用3个独立的Skill

利用结构化笔记的思路，我们也可以创建3个独立的Skill，分别负责开发、测试和审查。

开发Skill（位于.claude/skills/code-developer/SKILL.md）：

---
name: code-developer
description: Generate code from issue description.
---

You are a code developer. Your task is to generate code based on the provided issue description.

There is a {project_root}/NOTES.md file that records the progress of the task:
- Please review this file before starting the task.
- After completing the task, record the key progress in this file.

测试Skill（位于.claude/skills/code-tester/SKILL.md）：

---
name: code-tester
description: Generate tests for the provided code.
---

You are a code tester. Your task is to generate tests for the provided code.

There is a {project_root}/NOTES.md file that records the progress of the task:
- Please review this file before starting the task.
- After completing the task, record the key progress in this file.

审查Skill（位于.claude/skills/code-reviewer/SKILL.md）：

---
name: code-reviewer
description: Perform code review on the provided code.
---

You are a code reviewer. Your task is to perform code review on the provided code.

There is a {project_root}/NOTES.md file that records the progress of the task:
- Please review this file before starting the task.
- After completing the task, record the key progress in this file.

然后，我们在3个不同的会话里分别调用这3个Skill，从而会产生3个独立的Context。

开发Skill的Context：

[System] You are Claude Code.
[User] Complete issue.md
[Tool Call] Read(.claude/skills/code-developer/SKILL.md)
[Tool Output] <the skill instructions>
[Tool Call] Read(issue.md)
[Tool Output] <issue description>
[Tool Call] Read(NOTES.md)
[Tool Output] <notes content>
[Assistant] <generated code>
[Tool Call] Write(code.py)
[Tool Output] <success>
[Tool Call] Write(NOTES.md, "# Task Progress\n## Completed Tasks\n### 2026-03-08: Generated code and created code.py")
[Tool Output] <success>
[Assistant] <final output>

此时，NOTES.md的内容如下：

1
2
3

# Task Progress
## Completed Tasks
### 2026-03-08: Generated code and created code.py

测试Skill的Context：

[System] You are Claude Code.
[User] Generate tests for code.py
[Tool Call] Read(.claude/skills/code-tester/SKILL.md)
[Tool Output] <the skill instructions>
[Tool Call] Read(code.py)
[Tool Output] <code content>
[Tool Call] Read(NOTES.md)
[Tool Output] <notes content>
[Assistant] <generated tests>
[Tool Call] Write(tests.py)
[Tool Output] <success>
[Tool Call] Bash(python tests.py)
[Tool Output] <test results>
[Tool Call] Edit(NOTES.md, "\n### 2026-03-08: Generated tests and created tests.py")
[Tool Output] <success>
[Assistant] <final output>

此时，NOTES.md的内容如下：

# Task Progress
## Completed Tasks
### 2026-03-08: Generated code and created code.py
### 2026-03-08: Generated tests and created tests.py

审查Skill的Context：

[System] You are Claude Code.
[User] Perform code review on code.py
[Tool Call] Read(.claude/skills/code-reviewer/SKILL.md)
[Tool Output] <the skill instructions>
[Tool Call] Read(code.py)
[Tool Output] <code content>
[Tool Call] Read(NOTES.md)
[Tool Output] <notes content>
[Assistant] <generated review report>
[Tool Call] Write(review.md)
[Tool Output] <success>
[Tool Call] Edit(NOTES.md, "\n### 2026-03-08: Performed code review and created review.md")
[Tool Output] <success>
[Assistant] <final output>

此时，NOTES.md的内容如下：

# Task Progress
## Completed Tasks
### 2026-03-08: Generated code and created code.py
### 2026-03-08: Generated tests and created tests.py
### 2026-03-08: Performed code review and created review.md

由上可见：

因为每个Skill都在不同的会话中被调用，所以它们之间不会共享上下文，从而避免了上下文膨胀的问题。
每个Skill都通过NOTES.md文件来读取和记录关键进度，从而实现了跨Skill（跨会话）的信息传递，保证了整体任务的连贯性。

值得说明的是，这个技巧的本质是基于独立会话的多Agent协作。这里不同的Skill，实际上只是给每个Agent添加了不同的知识，使其能够执行特定的任务。

关于多Agent协作完成复杂任务的更多内容，Anthropic在另一篇文章Effective harnesses for long-running agents作了详细介绍，感兴趣的读者可以进一步阅读。

结语

Skills是一种强大的能力封装方式，但面对长程任务时，它并不是解决一切问题的银弹——LLM的上下文限制始终是绕不过去的瓶颈。

更实用的思路是将Skills与上下文工程结合起来：

通过Subagents拆分上下文，让不同角色各自处理独立子任务；
通过结构化笔记在上下文之外保存关键状态，实现跨会话的信息传递。

Skills负责告诉Agent“怎么做”，上下文工程决定Agent“能走多远”。两者结合，才是应对长程任务的完整答案。

cMCP v0.4.0发布：用配置文件管理你的MCP服务器

发表于 2026-03-03 分类于 AI

大家好！今天给大家介绍cMCP v0.4.0的重要更新。

cMCP是什么？

cMCP是一个MCP服务器的命令行工具，可以理解为“MCP版的curl” —— 通过命令行就能快速调用和测试MCP服务器的功能。

基本用法：

# STDIO transport
cmcp 'python server.py' tools/list

# HTTP transport
cmcp http://localhost:8000/mcp tools/call name=add arguments:='{"a": 1, "b": 2}'

v0.4.0新特性：`mcp.json`配置支持

为什么引入mcp.json配置文件？

简化命令输入：以前每次调用都要输入完整的命令或URL以及各种配置参数，比较繁琐。
拥抱生态标准：MCP生态逐渐形成了MCP JSON配置标准，主流工具如Cursor、Claude Code都在使用。

有了配置文件，就可以统一管理所有MCP服务器了！

创建配置文件 .cmcp/mcp.json（或 ~/.cmcp/mcp.json）：

{
  "mcpServers": {
    "local-server": {
      "command": "python",
      "args": ["server.py"],
      "env": {"API_KEY": "your-key"}
    },
    "remote-server": {
      "url": "http://localhost:3000/mcp",
      "headers": {"Authorization": "Bearer token"}
    }
  }
}

使用起来超级简单：

# 列出工具
cmcp :local-server tools/list

# 调用工具
cmcp :remote-server tools/call name=add arguments:='{"a": 1, "b": 2}'

只需要用 :server-name 就能引用预定义的服务器（及配置参数），大大提升效率！

兼容性

mcp.json配置格式与Cursor、Claude Code完全兼容，可以直接复用现有配置：

# 使用 Cursor 的配置
cmcp --config .cursor/mcp.json :my-server tools/list

# 使用 Claude Code 的配置
cmcp --config .mcp.json :my-server tools/list

快速开始

安装：

1	pip install cmcp

项目地址：https://github.com/RussellLuo/cmcp

欢迎大家体验 cMCP v0.4.0 的新功能！如果有任何问题或建议，欢迎在GitHub仓库中提出。

AI必知必会：Function Calling

发表于 2026-02-08 分类于 AI

从ChatGPT的横空出世，到OpenClaw的一夜爆火，AI技术的发展可谓日新月异。如果说LLM是最强大脑，那么赋予它手和脚，使其从对话框中走出来的，正是Function Calling。

Function Calling（函数调用）有时也称为Tool Use（工具使用）。按照OpenAI的官方定义，Function Calling为LLM提供了一种强大且灵活的方式，使其能够与外部系统交互并获取训练数据之外的信息。

要理解Function Calling的真正价值，我们需要先来了解其背后的应用场景。

应用场景

现实生活中，我们常常需要与各种系统和服务进行交互，以获取所需的信息或完成特定的任务。例如，查询天气、预订机票、支付账单、控制智能家居设备等。

以天气查询为例：

What is the weather like in Chengdu?

为了处理这个任务，AI系统通常需要执行以下操作：

理解用户的自然语言输入，并将其转化为结构化的指令或意图（这里的意图是“查询天气”，参数是“成都”）。
根据指令调用相应的API或服务，得到结果（这里的结果可能包括温度、湿度等）。
将结果加以整理和格式化，再以自然语言的形式返回给用户（例如，“今天成都的气温是22度，湿度为60%”）。

由此可见，这类AI系统的本质在于：

自然语言 → 结构化语义 → 触发系统动作

其中，从自然语言到结构化语义的转换，是整个流程的核心。

NLU

在LLM出现之前，这类任务通常由传统的自然语言理解（Natural Language Understanding, NLU）来完成。作为NLP的一个子集，NLU技术主要包括意图识别（Intent Recognition）和槽位填充（Slot Filling）两个子任务。

上述天气查询的例子，经过NLU处理后，得到的结构化输出大致是这样的：


query	What	is	the	weather	like	in	Chengdu	?
slots	O	O	O	O	O	O	B-loc	O
intent	get_weather

然而，NLU技术存在一些明显的局限性：

所有意图必须预定义（分类）
所有槽位必须事先建模（序列标注）
泛化能力差，新场景需要重新训练模型
多步推理能力几乎为0

Prompt Engineering

LLM兴起以后，其强大的生成式语义推理、结构约束和泛化能力，完美地解决了NLU的诸多痛点。然而，早期的LLM并不具备Function Calling的能力（如DeepSeek-R1）。于是，人们主要通过Prompt Engineering（提示词工程）的方式，引导LLM生成符合预期的结构化数据。

例如，针对天气查询的例子，可以设计如下Prompt：

You are an assistant that can perform actions based on user requests. Your responses should be in JSON format with the following structure:
1
2
3
4
5
6
7
{
  "name": "action_name",
  "arguments": {
    "key1": "value1",
    "key2": "value2"
   }
}
Query: What is the weather like in Chengdu?

发给LLM后，就能得到如下JSON格式的文本输出：

{
  "name": "get_weather",
  "arguments": {
    "location": "Chengdu"
  }
}

Structured Output

上述方式虽然有效，但是稳定性不高。随着模型能力的演进，很多LLM开始原生地支持JSON mode，后来又进一步支持了Structured Output（结构化输出），生成结构化数据的能力得到了显著提升。

使用Structured Output，可以非常稳定地生成结构化数据。有些SDK（如OpenAI Python SDK），甚至还提供了与数据验证库（如Pydantic）的无缝集成，进一步增强了类型检查和数据验证的功能。例如：

from openai import OpenAI
from pydantic import BaseModel, Field

client = OpenAI()

class Arguments(BaseModel):
    location: str

class Query(BaseModel):
    name: str = Field(..., description="Action to perform")
    arguments: Arguments = Field(..., description="Arguments for the action")

response = client.responses.parse(
    model="gpt-4o-2024-08-06",
    input=[
        {
            "role": "system",
            "content": "Extract the query information.",
        },
        {
            "role": "user",
            "content": "What is the weather like in Chengdu?",
        },
    ],
    text_format=Query,
)

print(response.output_parsed)

# Output:
# name='getWeather' arguments=Arguments(location='Chengdu')

Structured Output vs Function Calling

事实上，在OpenAI的生态中，Function Calling（于2023年6月推出）先于Structured Output（于2024年8月推出）出现。

早期的Function Calling有时会“幻觉”出不符合格式的JSON，而Structured Output则可以确保输出严格遵循Schema。因此，Structured Output也可以看作是Function Calling能力的底层升级（Strict mode）。

从形式上来看，虽然Structured Output和Function Calling都会让LLM生成结构化的数据（通常是JSON），但它们解决的问题维度并不相同：


	Structured Output	Function Calling
主要目的	严格保证每一条输出都符合格式	由LLM灵活决定是否调用工具、调用哪个工具
适用场景	数据提取、实体识别、表单生成等	智能体（Agent）、检索增强（RAG）等

Function Calling

由上述分析可见，Function Calling的核心在于由LLM灵活决定是否调用工具、调用哪个工具。仍然以天气查询为例，如果我们将思维模式从“结构化输出”转变为“工具调用”，整体的处理逻辑就会截然不同。

如图所示，使用Function Calling来处理天气查询，整体的流程大致如下：

向模型发送用户请求，并明确声明其可调用的工具列表（如get_weather(location)）。
模型根据请求，决定需要调用的工具名称（如get_weather）及相应的参数（如{"location": "chengdu"}）。
应用程序解析工具调用请求后，执行对应的代码，并获取结果（如{"temperature": 14}）。
应用程序携带工具调用的结果，再次向模型发起请求。
模型据此生成最终的回复（如It's currently 14°C in Chengdu.），或者再次调用其他工具。

对应于上述流程，下面给出一个可运行的Python示例：

import json
from openai import OpenAI

client = OpenAI()

# 1. Define a list of callable tools for the model
# (Note that `parameters` are defined in JSON Schema)
tools = [
    {
        "type": "function",
        "name": "get_weather",
        "description": "Retrieves current weather for the given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City e.g. Beijng, Chegndu"
                }
            },
            "required": ["location"],
            "additionalProperties": False
        },
        "strict": True
    },
]

def get_weather(location: str) -> str:
    # Here you would normally make an API call to a weather service
    return '{"temperature": 14}'

# Create a running input list we will add to over time
input_list = [
    {"role": "user", "content": "What is the weather like in Chengdu?"},
]

# 2. Prompt the model with tools defined
response = client.responses.create(
    model="gpt-5",
    tools=tools,
    input=input_list,
)

# Save function call outputs for subsequent requests
input_list += response.output

for item in response.output:
    if item.type == "function_call":
        if item.name == "get_weather":
            # 3. Execute the function logic for `get_weather`
            location = json.loads(item.arguments)["location"]
            output = get_weather(location)
            
            # 4. Provide function call results to the model
            input_list.append({
                "type": "function_call_output",
                "call_id": item.call_id,
                "output": output,
            })

response = client.responses.create(
    model="gpt-5",
    instructions="You are a helpful assistant.",
    tools=tools,
    input=input_list,
)

# 5. The model should be able to give a response!
print(response.output_text)

# Output:
# The current temperature in Chengdu is about 14°C.

框架封装

至此，我们已了解了Function Calling的基本原理与使用方式。但在实际应用中，若每个工具都需要手动编写JSON Schema，其开发复杂度势必大幅增加。因此，众多开发框架对此进行了封装，以提供更高层次的抽象。

例如，使用LangChain提供的@tool装饰器，便能轻松地将Python函数注册为LLM可调用的工具：

from langchain.chat_models import init_chat_model
from langchain.tools import tool

model = init_chat_model("gpt-5")

@tool
def get_weather(location: str) -> str:
    """Retrieves current weather for the given location."""
    return f"It's sunny in {location}."

# 1. Bind (potentially multiple) tools to the model
model_with_tools = model.bind_tools([get_weather])

# 2. Model generates tool calls
messages = [{"role": "user", "content": "What's the weather like in Chengdu?"}]
ai_msg = model_with_tools.invoke(messages)
messages.append(ai_msg)

for tool_call in ai_msg.tool_calls:
    # 3. Execute the tool with the generated arguments
    tool_result = get_weather.invoke(tool_call)
    # 4. Pass results back to model
    messages.append(tool_result)

# 5. Model generates the final response
final_response = model_with_tools.invoke(messages)
print(final_response.text)

# Output:
# It's sunny in Chengdu right now.

如今，Agent已成为主流，上述工具执行循环（Tool Execution Loop）也演进为ReAct Agent的核心范式。为此，很多框架进一步对整个Agent模式进行了抽象，从而将Function Calling的复杂性封装在高层API之下，大幅降低了开发门槛。

例如，借助LangChain提供的create_agent函数，只需寥寥数行代码，就能构建出一个完整的天气查询Agent：

from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
from langchain.tools import tool

model = init_chat_model("gpt-5")

@tool
def get_weather(location: str) -> str:
    """Retrieves current weather for the given location."""
    return f"It's sunny in {location}."

agent = create_agent(model, tools=[get_weather])
result = agent.invoke(
    {"messages": [{"role": "user", "content": "What's the weather like in Chengdu?"}]}
)
print(result["messages"][-1].content)

# Output:
# It’s sunny in Chengdu right now.

结语

本文从实际应用出发，梳理了从传统NLU到Prompt Engineering，再到Structured Output和Function Calling的技术演进脉络，并对比了后两者的功能定位差异。同时，介绍了如何使用框架来简化Function Calling的开发流程。

在Claude Code、OpenClaw等智能体日益普及的当下，Function Calling作为LLM的核心能力，已成为驱动这些Agent的重要技术基石。随着这项技术的广泛应用，工具生态的开放性与标准化也愈发关键。在此趋势下，MCP（模型上下文协议）已逐渐成为行业的事实标准，我们将在后续的文章中继续探讨这一话题。

谈谈Agent Skills的底层原理

发表于 2026-01-18 分类于 AI

自去年10月推出以来，Agent Skills迅速成为Claude Agent产品的核心组件，并在社区中得到了广泛的采纳和应用。关于Skills的介绍，网上已经有很多文章。本文尝试从底层原理的角度，探讨Agent Skills是如何工作的。

按照官方定义，Skills是一种基于文件系统的资源，用于为Agent提供特定领域的专业知识，从而将其转变为专家。理解Skills的前提，在于先厘清Agent的基本原理。

ReAct Agent

现代Agent都是基于ReAct模式构建的。ReAct的核心思想是将大语言模型的推理能力（Reasoning）与行动执行（Acting）相结合，使Agent能够反复思考问题、使用工具，并根据观察结果采取行动，从而实现用户目标。

早期采用ReAct模式的Agent，效果并不理想。随着LLM能力的持续演进，特别是函数调用（Function Calling）的引入，ReAct模式的效果得到了显著改善，使得Agent能够更可靠且高效地完成任务。

以天气查询为例，从上下文（Context）的角度来看，ReAct Agent的运行过程大致如下：

System: You are a helpful weather assistant.
User: What is the weather like in Chengdu?
Assistant: ToolCall(name="get_weather", args={"location": "Chengdu"})
User: ToolOutput(result={"weather": "Sunny", "temperature": "22°C"})
Assistant: The weather in Chengdu is Sunny with a temperature of 22°C.

Claude Code

作为一个现代Agent系统，Claude Code也遵循了ReAct模式。我们在揭秘Claude Code：自主式编程中介绍过它的核心架构：

Claude Code自主式编程架构

可以看出，Claude Code与常规Agent（如天气查询助手）最大的不同之处在于：它工作在操作系统之上，几乎所有的工具都是围绕文件系统和Shell命令构建而成！

以“查看文件并创建一个Hello World函数”为例，Claude Code运行过程中的Context大致如下：

System: You are Claude Code, Anthropic's official CLI for Claude.
User: What files are there?
Assistant: ToolCall(name="Bash", args={"command": "ls"})
User: ToolOutput(result="[README.md]")
Assistant: There is only one file named README.md.
User: Create a hello world function in Python.
Assistant: ToolCall(name="Write", args={"file_path": "hello_world.py", "content": "def hello_world():\n    print('Hello, World!')\n\nif __name__ == '__main__':\n    hello_world()"})
User: ToolOutput(result="Created `hello_world.py` with a simple hello world function.")
Assistant: I've created a simple Python file with a "Hello, World!" function.

上下文管理

有了对ReAct Agent和Claude Code的基本认识，我们再来讨论一个关键话题——上下文管理。

了解大语言模型的读者可能知道，LLM的上下文有两个重要特征：

上下文窗口大小限制：LLM的上下文窗口大小是有限的（早期GPT 3仅有2048个token），虽然这个大小在持续增长（比如最新Claude Sonnet 4.5已支持百万token），但仍然是有上限的。
上下文过载导致性能下降：即使最先进的LLM支持长上下文（如百万token），但如果上下文内容过多，其性能也会显著下降。除了经典的Lost in the Middle，还会出现上下文污染（Context Poisoning）、上下文混淆（Context Confusion）等各种问题。感兴趣的读者可以进一步参考How Long Contexts Fail。

因此，如何有效地管理上下文，成为了Agent设计中的一个重要课题。常见的上下文管理策略包括检索增强（RAG）、上下文总结（Context Summarization）、上下文隔离（Context Quarantine）和上下文卸载（Context Offloading）等。本文的讨论重点关注Context Offloading。

关于Context Offloading，How to Fix Your Context一文给出了以下定义：

上下文卸载（Context Offloading）是指将信息存到LLM的上下文之外，通常借助能管理数据的工具来实现。

而该文引用的Anthropic原文The “think” tool中，则这样指出：

这个“think”工具特别适合用在那些仅凭用户提问、Claude信息不够没法直接回答的情况，还有那些需要处理外部信息（比如工具返回的结果）的场景。比起深度思考那种全面推演，Claude用“think”工具做的推理会更聚焦，主要围绕模型刚发现的新信息展开。

在Claude Code中实现Context Offloading

上述关于Context Offloading的描述稍显抽象。为了便于理解，我们来设想一个问题：如果要让Claude Code支持Context Offloading，应该如何实现？

结合前文的讨论，我们知道Claude Code的能力基本建立在文件系统和Shell命令之上。因此，要实现Context Offloading，从第一性原理出发，我们能够很自然地想到以下方案：

将信息存储在文件系统，而不是一开始就全部放到LLM的上下文中；
为了让LLM知道这些信息的存在，需要在LLM的上下文中（通常在系统提示词中）记录这些信息的位置；
用户提问信息不全时，Agent可以根据系统提示词的引导，按需从文件系统寻找信息；
寻找信息的过程，需要借助文件读写和Shell命令等工具；
将获取到的外部信息加载到LLM的上下文中，以辅助完成用户的目标。

以“生成博客前端页面”为例，为了不把所有的设计指南都塞到LLM的上下文中，可以将其保存到resources/frontend-design.md文件中：

# Frontend Aesthetics Guidelines

Focus on:

**Typography**: Choose fonts that are beautiful, unique, and interesting...

**Color & Theme**: Commit to a cohesive aesthetic...

...

同时，在Agent的系统提示词中添加对这个文件的引用：

You are Claude Code...

You have access to the following resources:
- `resources/frontend-design.md`: Guidelines for designing the frontend UI.

...

如此一来，Agent只会在用户请求生成前端页面时，才会去额外读取resources/frontend-design.md文件中的内容，从而避免了不必要的上下文膨胀。具体来说，运行过程中的Context可能会是这样的：

System: You are Claude Code...\n\nYou have access to the following resources:\n- `resources/frontend-design.md`: Guidelines for designing the frontend UI.
User: Generate a blog frontend UI.
Assistant: ToolCall(name="Read", args={"file_path": "resources/fontend-design.md"})
User: ToolOutput(result="Choose fonts that are beautiful, unique, and interesting...")
Assistant: ToolCall(name="Write", args={"file_path": "app/index.html", "content": "<html><head><style>...</style></head><body>...</body></html>"})
User: ToolOutput(result="Created file `app/index.html`.")
Assistant: ToolCall(name="Write", args={"file_path": "app/styles.css", "content": "..."})
User: ToolOutput(result="Created file `app/styles.css`.")
Assistant: I've generated a simple blog frontend UI based on the guidelines.

讨论到这里，使用过Skills的读者可能发现了，如果把上述例子中的resources/重命名为skills/，那么frontend-design.md本质上就是一个Skill（参考anthropics/skills/frontend-design/SKILL.md）。

Skills的三层加载技术

至此我们可以看出，Skills的核心思想，其实也遵循了Context Offloading的上下文管理策略。当然，上述例子只是最基础的实现。

Agent Skills上下文窗口

在Anthropic的设计中，又巧妙地引入了Skills的三层加载技术，以求最大化减少LLM上下文的负担：

元数据（Metadata）：可用Skills的名称、描述及其文件路径。这些信息会被预先放到上下文（系统提示词）中，以确保Agent知道有哪些Skills可以利用。
指令（Instructions）：每个Skill都有一个对应的SKILL.md文件，其中包含了Skill的详细描述、使用方法和示例等信息。当Agent需要某个Skill的帮助时，它会通过Read工具读取SKILL.md文件的内容，进而将其动态加载到上下文中。
资源（Resources）：除了SKILL.md文件，每个Skill还可以包含其他类型的资源文件，如配置文件、文档等。当Agent需要更具体的信息时，它会进一步读取这些资源文件的内容，从而将其加载到上下文中。

代码执行与虚拟机

除了前文讨论的内容，需要强调的是，Skills的完整能力还涉及代码执行和虚拟机：

代码执行（Code Execution）：某些Skills可能包含代码片段，甚至Agent为了处理任务还会动态生成代码，这些代码都需要执行。
虚拟机（Virtual Machine）：为了确保安全性，通常需要在一个隔离的沙盒环境（虚拟机）中管理文件系统、执行Shell命令和运行代码。

Agent Skills架构

由于篇幅所限，这里不再展开详细讨论，感兴趣的读者可以参考官方文档或者其他相关资料。

结语

通过本文的探讨，相信读者对Agent Skills有了更深入的理解。在Claude Agent产品中，Skills的实现基于Context Offloading这一上下文管理策略；而该策略的落地，则依托于ReAct模式的思想框架，以及文件系统、Shell命令等基础工具的支撑。

此外，代码执行和虚拟机也是非常重要的话题，本文限于篇幅只做了简要提及。实际上，它们不仅是Skills的关键技术，也代表着Agent未来的主流演进方向。在后续的文章中，我们将继续深入探讨这些话题，敬请期待！

Claude Code伴侣：可视化管理新体验

发表于 2025-09-23 分类于 AI

引言

Claude Code伴侣（即Claude Code Mate，以下简称CCM）最初的定位是一个极简的LLM代理工具，旨在帮助开发者快速用上Claude Code和切换各种大模型。所以CCM一开始选择了纯命令行交互，以及通过YAML文件进行配置。这种方式既简洁高效，也符合开发者的使用习惯。

距离CCM第一版发布（轻松解锁Claude Code：国内用户的多元模型新玩法）已经过去了近一个月，随着自己使用的增多，以及零星收到的一些用户反馈，我发现CCM存在以下问题：

模型管理不直观：用户如果不查阅LiteLLM文档，很难知道CCM支持哪些模型，以及如何配置这些模型（比如OpenAI兼容的模型，需要加上openai/前缀）。
没有用量统计：用户在CCM中无法查看自己的用量情况，包括各个模型的请求数、输入/输出Token数和费用消耗等信息。

为了解决这些问题，我决定为CCM引入一个可视化管理后台，以提升用户体验。

PostgreSQL小插曲

了解LiteLLM的朋友可能知道，LiteLLM Proxy其实原本就提供Admin UI。然而，它对数据库的选择有特定偏好：仅支持PostgreSQL，并且明确表示不考虑SQLite等其他轻量级选项。

这意味着，如果要启用Admin UI，用户必须要先安装和配置PostgreSQL。这显然与CCM的初衷——提供一个简单易用的工具——是相违背的。

幸运的是，经过一番调研，我找到了一个Python库pgserver，可以实现：

可嵌入：通过pip安装依赖库，即可自动下载PostgreSQL
零配置：无需用户手动设置数据库环境
跨平台：支持Windows、macOS和Linux
无Docker：不需要额外安装和配置Docker

于是在pgserver的帮助下，CCM成功引入了Admin UI，并做到了用户无感。

快速开始

为了保持轻量级，CCM默认不包含UI功能。如果要启用UI功能，可以通过以下命令安装：

# 使用uv（推荐）
uv pip install --system --python 3.12 "claude-code-mate[ui]"
# 或者使用pip
pip install "claude-code-mate[ui]"

安装后，使用以下命令启动UI：

ccm ui

打开Admin UI后，使用默认的用户名和密码（admin和sk-1234567890）登录，即可进入管理后台。

可视化管理后台

LiteLLM Proxy提供的Admin UI功能很强大，其中就包括CCM第一版缺失的模型管理和用量统计功能。

模型管理

模型管理界面

LiteLLM内置支持众多提供商的模型，包括但不限于：

知名官方模型（如Anthropic、OpenAI和DeepSeek等）
聚合平台的模型（如OpenRouter等）
OpenAI兼容模型
Ollama本地模型

用户可以通过界面：

轻松添加、编辑和删除模型，无需了解LiteLLM的特殊前缀规则。
设置输入/输出的Token价格，以便准确计算费用。
修改一些高级参数，如TPM/RPM、超时时间和max_tokens等。

用量统计

用量统计界面

通过用量统计功能，用户可以清晰地看到：

总（或按模型）的请求次数，以及成功和失败的次数。
总（或按模型）的Token数量，以及输入和输出的Token数量。
总（或按模型）的费用消耗情况等。

其他功能

除了上述两个功能外，Admin UI还提供了以下一些实用功能：

模型测试（Test Key）：快速测试模型的可用性和效果。
日志查看（Logs）：实时查看请求日志，便于调试和排查问题。

模型测试界面

日志界面

以上只列举了Admin UI的部分功能，对于其他功能感兴趣的读者，可以进一步参考LiteLLM Proxy文档。

结语

欢迎大家体验Claude Code Mate的新UI功能！需要说明的是，该功能只在macOS（我的开发环境）进行了测试，尚未在Windows和Linux上进行全面验证。如果你有任何问题或建议，欢迎随时在GitHub仓库中提出。

长程任务为什么困难

有了Skills为什么还需要上下文工程

一个简化案例：开发流程Agent

技巧一：使用1个主Agent和3个Subagent

技巧二：使用3个独立的Skill

结语

cMCP是什么？

v0.4.0新特性：mcp.json配置支持

兼容性

快速开始

应用场景

NLU

Prompt Engineering

Structured Output

Structured Output vs Function Calling

Function Calling

框架封装

结语

ReAct Agent

Claude Code

上下文管理

在Claude Code中实现Context Offloading

Skills的三层加载技术

代码执行与虚拟机

结语

引言

PostgreSQL小插曲

快速开始

可视化管理后台

模型管理

用量统计

其他功能

结语

v0.4.0新特性：`mcp.json`配置支持