Custom Targets
If you want to test an agent that is not natively supported, you can bring your own Target by defining a Python module containing a subclass of BaseTarget. The name of this module must contain the suffix _target
(e.g. my_custom_target
), and the subclass should implement the invoke
method to invoke your agent.
from agenteval.targets import BaseTarget
from agenteval import TargetResponse
from my_agent import MyAgent
class MyCustomTarget(BaseTarget):
def __init__(self, **kwargs):
self.agent = MyAgent(**kwargs)
def invoke(self, prompt: str) -> TargetResponse:
response = self.agent.invoke(prompt)
return TargetResponse(response=response)
Once you have created your subclass, specify the module path to the Target.
target:
type: path.to.my_custom_target.MyCustomTarget`
my_agent_parameter: "value" # (1)
- This will be passed as
kwargs
when initializing the Target.
Warning
During a run, an instance of the Target will be created for each test in the test plan. We recommend avoiding testing Targets that load large models or vector stores into memory, as this can lead to a memory error. Consider deploying your agent and exposing it as a RESTful service.
Examples
REST API
We will implement a custom Target that invokes an agent exposed as a REST API.
my_api_target.py
import json
import requests
from agenteval.targets import BaseTarget
from agenteval import TargetResponse
class MyAPITarget(BaseTarget):
def __init__(self, **kwargs):
self.url = kwargs.get("url")
def invoke(self, prompt: str) -> TargetResponse:
data = {"message": prompt}
response = requests.post(
self.url, json=data, headers={"Content-Type": "application/json"}
)
return TargetResponse(response=json.loads(response.content)["agentResponse"])
Create a test plan that references MyAPITarget
.
agenteval.yml
evaluator:
model: claude-3
target:
type: my_api_target.MyAPITarget
url: https://api.example.com/invoke
tests:
get_backlog_tickets:
steps:
- Ask agent how many tickets are left in the backlog
expected_results:
- Agent responds with 15 tickets
LangChain agent
We will create a simple LangChain agent which calculates the length of a given piece of text.
my_langchain_target.py
from langchain_community.llms import Bedrock
from langchain.agents import tool
from langchain import hub
from langchain.agents import AgentExecutor, create_xml_agent
from agenteval.targets import BaseTarget
from agenteval import TargetResponse
llm = Bedrock(model_id="anthropic.claude-v2:1")
@tool
def calculate_text_length(text: str) -> int:
"""Returns the length of a given text."""
return len(text)
tools = [calculate_text_length]
prompt = hub.pull("hwchase17/xml-agent-convo")
agent = create_xml_agent(llm, tools, prompt)
class MyLangChainTarget(BaseTarget):
def __init__(self, **kwargs):
self.agent = AgentExecutor(agent=agent, tools=tools, verbose=False)
def invoke(self, prompt: str) -> TargetResponse:
response = self.agent.invoke({"input": prompt})["output"]
return TargetResponse(response=response)
Create a test plan that references MyLangChainTarget
.
agenteval.yml
evaluator:
model: claude-3
target:
type: my_langchain_target.MyLangChainTarget
tests:
calculate_text_length:
steps:
- "Ask agent to calculate the length of this text: Hello world!"
expected_results:
- The agent responds that the length is 12.