WWT

Introduction

This post is designed for quality engineers and software developers looking to leverage AI in their testing processes. The integration of AI and large language models (LLMs) is transforming the way developers build and deploy software. But what does this mean for software quality? WWT built the Quality Assurance Testing Blueprint to equip quality engineers (QEs) with tools to use AI and LLMs to assist in the creation of automated user interface (UI) tests across various applications and industries.

Learn about WWT's New AI Blueprint for Webpage Quality Assurance.

Background: What is a story in application development?

A story can mean many things, but it typically represents a discrete chunk of development work done on an application. This could be fixing a bug, creating new features or enhancing existing functionality. Depending on the development team, these stories can be written in many different styles and may have sparse details, but it is the job of the developer(s) who are assigned to the story to turn the language of the card into code.

A QE's job is then to ensure that the functionality the story describes has been accurately achieved while also avoiding breaking any other functionality in the application. A QE's assessment can include:

The behavior of a feature when it is used under ideal circumstances (Happy Path)
Edge cases such as erroneous user input
Security concerns, such as SQL injection
Behavior of other features potentially impacted by the changes
Behavior of a feature for different types of users, such as an administrator
Visual appearance on various screen sizes, such as on mobile devices

The Quality Assurance Testing (QAT) Blueprint aims to aid QEs in this validation process by helping to convert a story into a series of automated UI tests.

Overview of technology used

The QAT Blueprint is written primarily in Python. The primary packages used are Click for creating a Command-Line Interface (CLI), Browser Use for allowing LLMs to interact with the browser, and PyTest with Playwright for running the automated tests.

In addition, LangChain is used for Browser Use and other LLM interactions with support for Postgres databases with PG Vector to perform embedding-based searches.

The tool was written using Windsurf, an AI-powered code editor.

Using the QAT Blueprint

The first step requires configuring the tool by providing various details about the application being tested and a Large Language Model (LLM) used to perform tasks. Other optional configurations include providing relevant code bases and files for searching (which requires having an embedding model and vector store configured as well) and test accounts or personas that the automated tests will utilize.

Once the project is configured, a set of automated tests can be created by first passing the story information to the planning agent. Once a plan is created, the learning agent, by using Browser Use, is tasked to discover how the planned tests can be completed as a series of browser actions such as navigation, clicks and typing. Using a generated trace of the learning agent's actions, the writing agent then converts these actions into Playwright tests written in Python. After the tests are written, they can be executed using the testing agent, which will examine the output of the test runs and suggest changes as needed.

At every step of the process, the QE using the tool is in full control and can audit or tweak any of the actions taken by the LLM. When creating a test plan, the LLM is encouraged to ask the QE clarifying questions or to provide test data, allowing for the essential human in the loop to avoid assumptions or hallucinations. The test agent is also not allowed to directly edit test files or trigger test runs without explicit consent from the QE operating the tool. We also allow the user to update and configure all the prompts used.

Agent structure

Rather than using an existing agentic approach, we opted to create our own small agent class to easily adjust as needed for the tool's goals. For example, OpenAI does not currently support tool calls that return images, so we were able to introduce an explicit type for tools to return that would translate the returned image into an acknowledgment of the tool call and a user message containing the image itself.

Each agent is only allowed to make tool calls and is given a set of tools to choose from. By default, each agent has a Thought tool (allowing the agent to use extra tokens for reasoning) and a Done tool (allowing the agent to conclude its task either successfully or unsuccessfully).

Each agent also has a state that is used for formatting the initial prompt and keeping track of information outside the message context. This is helpful for operations that cause changes that could easily introduce excessive tokens if the LLM were given the raw result when a simple confirmation of the tool's success is enough to notify the LLM of a state change.

Future work

There are many enhancements we would like to experiment with on the QAT Blueprint. A few possibilities are:

Common actions: Allow the planner and learner to identify actions common among a series of tests and extract those actions into reusable utilities. This would allow the learner to reuse previously learned functionality, such as page navigation or search functionality, without having to do so from scratch. These utility functions can also live in the generated codebase and be used by the resulting tests themselves for more efficient updates and readability.
Full debug tracing: The current test agent has several tools, such as reading the test code, searching available repositories, and even viewing the final image of a failing test. Playwright also provides a full trace of the test flow to visualize how the page state changes over time, check network activity, or examine full DOM snapshots. The density of this information could easily overwhelm a human or an LLM, so providing an intuitive way to extract the needed context without opening a firehose is essential.
Modularity: Adding support for more models (currently only OpenAI) should be simple enough. But we can also enhance the project by allowing the user to choose functionality such as switching from Browser Use to the Playwright MCP server. We could also allow users to switch from outputting code in Python to JavaScript or other languages.

Conclusion

The Quality Assurance Testing Blueprint highlights the role AI and LLMs can play in enhancing software testing, serving as a tool for Quality Engineers to improve their testing process. By automating the conversion of development stories into automated tests, it empowers QEs to maintain control and oversight, ensuring coverage of user paths from ideal conditions to complex edge cases. As software development continues to grow with AI and LLMs, the Quality Assurance Testing Blueprint, with its AI-driven capabilities, helps drive robust and reliable application testing while keeping the QE at the center of the process.

Check out our AI Assistants page! Learn more

WWT's Quality Assurance Testing Blueprint: An Overview

In this blog