评估 - LangChain 框架 --知识铺
评估¶
要评估代理的性能,可以使用 LangSmith 评估。您需要首先定义一个评估器函数来判断代理的结果,例如最终输出或轨迹。根据您的评估技术,这可能涉及或不涉及参考输出。
<span id="__span-0-1">def evaluator(*, outputs: dict, reference_outputs: dict):
<span id="__span-0-2"> # compare agent outputs against reference outputs
<span id="__span-0-3"> output_messages = outputs["messages"]
<span id="__span-0-4"> reference_messages = reference["messages"]
<span id="__span-0-5"> score = compare_messages(output_messages, reference_messages)
<span id="__span-0-6"> return {"key": "evaluator_score", "score": score}
要开始使用,您可以使用 AgentEvals 包中的预构建评估器。
<span id="__span-1-1">pip install -U agentevals
创建评估器¶
评估代理性能的一种常用方法是将其轨迹(调用工具的顺序)与参考轨迹进行比较。
<span id="__span-2-1">import json
<span id="__span-2-2">from agentevals.trajectory.match import create_trajectory_match_evaluator
<span id="__span-2-3">
<span id="__span-2-4">outputs = [
<span id="__span-2-5"> {
<span id="__span-2-6"> "role": "assistant",
<span id="__span-2-7"> "tool_calls": [
<span id="__span-2-8"> {
<span id="__span-2-9"> "function": {
<span id="__span-2-10"> "name": "get_weather",
<span id="__span-2-11"> "arguments": json.dumps({"city": "san francisco"}),
<span id="__span-2-12"> }
<span id="__span-2-13"> },
<span id="__span-2-14"> {
<span id="__span-2-15"> "function": {
<span id="__span-2-16"> "name": "get_directions",
<span id="__span-2-17"> "arguments": json.dumps({"destination": "presidio"}),
<span id="__span-2-18"> }
<span id="__span-2-19"> }
<span id="__span-2-20"> ],
<span id="__span-2-21"> }
<span id="__span-2-22">]
<span id="__span-2-23">reference_outputs = [
<span id="__span-2-24"> {
<span id="__span-2-25"> "role": "assistant",
<span id="__span-2-26"> "tool_calls": [
<span id="__span-2-27"> {
<span id="__span-2-28"> "function": {
<span id="__span-2-29"> "name": "get_weather",
<span id="__span-2-30"> "arguments": json.dumps({"city": "san francisco"}),
<span id="__span-2-31"> }
<span id="__span-2-32"> },
<span id="__span-2-33"> ],
<span id="__span-2-34"> }
<span id="__span-2-35">]
<span id="__span-2-36">
<span id="__span-2-37"># Create the evaluator
<span id="__span-2-38">evaluator = create_trajectory_match_evaluator(
<span id="__span-2-39"> trajectory_match_mode="superset",
<span id="__span-2-40">)
<span id="__span-2-41">
<span id="__span-2-42"># Run the evaluator
<span id="__span-2-43">result = evaluator(
<span id="__span-2-44"> outputs=outputs, reference_outputs=reference_outputs
<span id="__span-2-45">)
下一步,了解如何自定义轨迹匹配评估器。
LLM 作为裁判¶
您可以使用 LLM 作为裁判的评估器,它使用一个 LLM 来比较轨迹与参考输出并输出分数。
<span id="__span-3-1">import json
<span id="__span-3-2">from agentevals.trajectory.llm import (
<span id="__span-3-3"> create_trajectory_llm_as_judge,
<span id="__span-3-4"> TRAJECTORY_ACCURACY_PROMPT_WITH_REFERENCE
<span id="__span-3-5">)
<span id="__span-3-6">
<span id="__span-3-7">evaluator = create_trajectory_llm_as_judge(
<span id="__span-3-8"> prompt=TRAJECTORY_ACCURACY_PROMPT_WITH_REFERENCE,
<span id="__span-3-9"> model="openai:o3-mini"
<span id="__span-3-10">)
运行评估器¶
要运行评估器,您首先需要创建一个 LangSmith 数据集。要使用预构建的 AgentEvals 评估器,您需要一个具有以下模式的数据集:
- 输入:
{"messages": [...]}用于调用代理的输入消息。 - 输出:
{"messages": [...]}代理输出中预期的消息历史。对于轨迹评估,您可以选择只保留助手消息。
API 参考: create_react_agent
<span id="__span-4-1">from langsmith import Client
<span id="__span-4-2">from langgraph.prebuilt import create_react_agent
<span id="__span-4-3">from agentevals.trajectory.match import create_trajectory_match_evaluator
<span id="__span-4-4">
<span id="__span-4-5">client = Client()
<span id="__span-4-6">agent = create_react_agent(...)
<span id="__span-4-7">evaluator = create_trajectory_match_evaluator(...)
<span id="__span-4-8">
<span id="__span-4-9">experiment_results = client.evaluate(
<span id="__span-4-10"> lambda inputs: agent.invoke(inputs),
<span id="__span-4-11"> # replace with your dataset name
<span id="__span-4-12"> data="<Name of your dataset>",
<span id="__span-4-13"> evaluators=[evaluator]
<span id="__span-4-14">)
- 原文作者:知识铺
- 原文链接:https://index.zshipu.com/ai002/post/20251125/%E8%AF%84%E4%BC%B0-LangChain-%E6%A1%86%E6%9E%B6/
- 版权声明:本作品采用知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议进行许可,非商业转载请注明出处(作者,原文链接),商业转载请联系作者获得授权。
- 免责声明:本页面内容均来源于站内编辑发布,部分信息来源互联网,并不意味着本站赞同其观点或者证实其内容的真实性,如涉及版权等问题,请立即联系客服进行更改或删除,保证您的合法权益。转载请注明来源,欢迎对文章中的引用来源进行考证,欢迎指出任何有错误或不够清晰的表达。也可以邮件至 sblig@126.com