Structured Generation with LLMs(2)：Function Calling，不止于agent

在structured generation第一期，笔者介绍了基于prompt的Kor。Kor是对LLM的一层封装（prompt+parse），适用于结构化抽取/生成场景。

本期文章，笔者介绍Function Calling，并将其用于structured generation。

Function Calling（又称为Tool Calling，后文统称为FC）是构建agent的基石，也是各大LLM厂商的标配功能。要做到好的FC，LLM要能做到：

理解任务与function/tool的关系，知道是否要调用、需调用哪些function/tool、是否缺必要参数；
返回结构化内容，包括function name、arguments（json格式）。

本文主要关注FC的第二个能力，即structured generation能力。

FC的原理

我们无法准确知道，闭源模型是如何实现FC的，但开源模型能为我们提供思路。

这里以mistral近期发布的12B模型 — Mistral-Nemo-Instruct-2407^[1]为例，初步研究其实现方式。

从一个简单的FC例子入手：

tool是经典的get_current_weather，schema如下

Function(

  name="get_current_weather",

  description="Get the current weather",

  parameters={

    "type": "object",

    "properties": {

      "location": {

          "type": "string",

          "description": "The city and state, e.g. San Francisco, CA"},

        "format": {

           "type": "string",

           "enum": ["celsius", "fahrenheit"],

           "description": "The temperature unit to use. Infer this from the users location."},

      },

     "required": ["location", "format"],

})

用户query：What’s the weather like today in Paris and Beijing? I prefer Celsius format.

打印出来prompt如下：

在input侧，mistral-nemo的做法是直接将用户提供的tool schema转为string，并包裹在特殊的tag（AVAILABLE_TOOLS）之中，然后插入到user query之前。

既然都是组装Prompt，我们拿它和Kor的Prompt做个对比：

可以发现，mistral-nemo的prompt更精简（不包含Your goal is …. 、All output must be in JSON format…. 等内容）。

这就是微调模型与通用模型的用法差异：

mistral-nemo在fine-tuning时，按照这样的格式进行训练，FC的“要求”已经被encode到模型的参数中去了；
Kor是第三方实现，无从得知模型的训练细节，只能依靠模型的通用In Context Learning能力，因此需要把“要求”写清楚，于是prompt细节较多。

接着，我们来看output侧。

mistral-nemo的输出结果如下：

看起来，这是一个普通的text generation过程，通过特殊标记（TOOL_CALLS）来表明，这是一个tool_call message，而非常见的text message；同时nemo支持同时call多个tools，每个call为一个字典，其中包含function name和arguments参数（json格式）。

总结一下，mistral-nemo这样实现FC：

将tools按照特定的template，组装到prompt中去；
LLM输出时，也遵循特定的template，call tool时加入特殊标记（TOOL_CALLS），并返回name和arguments。

通过分析mistral-nemo，可以猜测，各家LLM公司有自己的FC prompt template，既体现在input侧，也体现在output侧。

练习时刻

动手实践是学习的好方法，本期我们仍然选用第一期的2个练习（中文翻译器、评价解析）。

练习部分的所有代码，都已整理在下方git，建议读者实际运行代码来学习：

https://github.com/duanyu/structured_generation_with_llm/blob/main/Lecture2_Function_Calling.ipynb

考虑到排版，笔者直接将截图贴在下面。

练习1：中文翻译器

练习2：评价解析

总结

本文介绍了第二种进行Structured Generation的技术：Function Calling。FC是Agent的基石，structured generation则是“副产品”；读者在实际使用中，可以将FC与Kor（或者自己写的prompt）做对比，选择效果更好的方案。

需要提到的是，FC虽然经过了fine-tuning，输出结构的稳定性有一定保证，但若未使用constrain decoding技术，那么仍然不是100%鲁棒的；同时，笔者在练习中发现，当使用glm-4-flash/air/airx模型时，FC难以有效加入few-shot examples，但在第一期练习中，glm4-9b-chat + Kor对few-shot examples十分友好，这可能是FC的一个问题（但也可能是用法不对，欢迎有经验的读者指正）。

文章转自微信公众号@漫谈NLP