Integration with CI/CD Pipelines
After validating the functionality in the development account, you can commit the code to the repository and initiate the deployment process for the virtual agent to the next stage. Seamless integration with CI/CD pipelines is a crucial aspect of Agent Evaluation, enabling comprehensive integration testing to ensure that no regressions are introduced during new feature development or updates. This rigorous testing approach is vital for maintaining the reliability and consistency of virtual agents as they progress through the software delivery lifecycle.
By incorporating Agent Evaluation into CI/CD workflows, organizations can automate the testing process, ensuring that every code change or update undergoes thorough evaluation before deployment. This proactive measure minimizes the risk of introducing bugs or inconsistencies that could compromise the virtual agent's performance and the overall user experience.
CI/CD workflow
The figure below shows what a standard agent CI/CD pipeline looks like:
- The source repository stores the agent configuration, including agent instructions, system prompts, model configuration, etc. You should always commit your changes to ensure quality and reproducibility.
- When you commit your changes, a build step is triggered. This is where unit tests should run and validate the changes, including typo and syntax checks.
- When the changes are deployed to the staging environment, Agent Evaluation should run with a series of test cases for runtime validation.
- The runtime validation on the staging environment will help build confidence to deploy the fully tested agent to production.
Step-by-step GitHub Actions setup
We have built an example with GitHub Actions, please take a look at the Github workflow. Here is the step-by-step setup guide:
-
Write a series of test cases following the agent-evaluation test plan syntax. Store test plans in the git repository. For example, a test plan to test a Bedrock agent target is written as follows, with
BEDROCK_AGENT_ALIAS_ID
andBEDROCK_AGENT_ID
as placeholders:evaluator: model: claude-3 target: bedrock_agent_alias_id: BEDROCK_AGENT_ALIAS_ID bedrock_agent_id: BEDROCK_AGENT_ID type: bedrock-agent tests: InsuranceClaimQuestions: ...
-
Create an IAM user with proper permissions:
- The principal must have
InvokeModel
permission to the model specified in the configuration. - The principal must have the permissions to call the target agent. Depending on the target type, different permissions are required. Please visit the agent-evaluation Target docs for details.
- The principal must have
- Store the IAM credentials (
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
) in GitHub Actions secrets. - Configure a GitHub workflow as follows:
name: CI/CD example on: push: branches: [ "main" ] env: AWS_REGION: us-east-1 # set this to your preferred AWS region, e.g. us-west-1 permissions: contents: read jobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4 - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v4 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: ${{ env.AWS_REGION }} - name: Install agent-evaluation run: | pip install agent-evaluation agenteval --help - name: Test Bedrock Agent id: test-bedrock-agent env: BEDROCK_AGENT_ALIAS_ID: ${{ vars.BEDROCK_AGENT_ALIAS_ID }} BEDROCK_AGENT_ID: ${{ vars.BEDROCK_AGENT_ID }} run: | sed -e "s/BEDROCK_AGENT_ALIAS_ID/$BEDROCK_AGENT_ALIAS_ID/g" -e "s/BEDROCK_AGENT_ID/$BEDROCK_AGENT_ID/g" <path-to-the-test-plan-template-file> > agenteval.yml agenteval run - name: Test Summary if: always() id: test-summary run: | if [ -f agenteval_summary.md ]; then cat agenteval_summary.md >> $GITHUB_STEP_SUMMARY fi