Overview
WALL·TE (Web Automation Large Language Test Engineer) is a browser automation framework that uses Large Language Models to interpret natural language test descriptions and execute them against web applications. Tests are written in Markdown, and AI handles the translation to browser actions.
The framework eliminates CSS selectors and XPath expressions by using vision-enabled LLMs to understand page structure semantically. This makes tests more resilient to UI changes and reduces maintenance overhead.
See It In Action
Watch WALL·TE execute real test scenarios. These recordings show the AI's decision-making process as it interprets natural language and interacts with live applications.
Synchronous Test Execution
Watch tests execute sequentially with real-time streaming output showing AI reasoning and browser actions.
Parallel Test Execution
Run multiple tests in parallel across different browsers and devices simultaneously. Demonstrates AI's ability to handle concurrent test execution.
Key Features
Markdown Test Format
Write tests in plain English using Markdown. No programming required.
Semantic Page Understanding
AI interprets page structure through accessibility trees and visual layout.
Multi-Device Support
Run tests across mobile, tablet, and desktop with a single test definition.
Parallel Execution
Run multiple tests concurrently for faster test suite completion.
Real-Time Streaming
Watch test execution with live progress updates and AI decision logs.
Structured Reports
JSON, YAML, or TOML output with screenshots, token usage, and cost breakdowns.
BYOK Model
Use your own OpenAI or Anthropic API keys. No vendor lock-in.
CI/CD Integration
Headless mode and structured output work with any CI/CD pipeline.
The Problem
Traditional browser automation relies on brittle selectors that break when markup changes. Every UI refactor requires updating test suites. Dynamic content requires explicit wait conditions and state management. Multi-device testing means maintaining separate test implementations.
- • CSS selectors and XPath break when HTML structure changes
- • Dynamic UIs require complex conditional logic for different states
- • Test maintenance costs often exceed initial development time
- • Different viewports require separate test implementations
- • Tests become outdated and teams stop maintaining them
Natural Language Testing
Tests are defined in Markdown files with natural language instructions. The AI interprets intent rather than following rigid selector-based instructions. This approach makes tests readable and maintainable.
Example: Authentication Flow
---
title: Login Test
description: Verify user authentication flow
tags: [auth, smoke-test]
---
## Successful Login
Test that a user can log in with valid credentials.
### Steps
1. Navigate to https://app.example.com/login
2. Fill in the email field with "test@example.com"
3. Fill in the password field with "SecurePass123"
4. Click the "Sign In" button
5. Wait for the dashboard to load
### Expectations
- The user should be redirected to /dashboard
- A welcome message should appear
- The user's profile icon should be visible in the headerHow It Works
Semantic Understanding
The AI analyzes the page's accessibility tree and visual layout to understand structure. Instead of looking for specific element IDs or classes, it identifies elements by their semantic role and visible text.
// Traditional approach
await page.click('#login-form button[type="submit"]')
// WALL·TE approach
"Click the Sign In button"Dynamic Adaptation
Instructions like "Click any product with 'wireless' in the title" work naturally. The AI searches the page for matching elements and selects appropriately. No explicit loops or conditional logic required.
Context Retention
The AI maintains conversation history across test steps. It remembers which product was selected, what form fields were filled, and can verify that later steps match earlier actions.
Technical Architecture
WALL·TE is built on three core components: Model Context Protocol integration for browser control, multi-model support for AI providers, and cost tracking for token usage.
Model Context Protocol (MCP) Integration
We use Anthropic's Model Context Protocol to provide LLMs with structured access to Playwright browser automation. MCP defines a standard interface for AI tools to interact with external systems.
The MCP server exposes browser primitives as structured tool calls:
- • Navigation, clicking, typing, and form interaction
- • Page snapshots and accessibility tree queries
- • Screenshot capture for visual verification
- • Console log and network request monitoring
Multi-Model Support
WALL·TE supports multiple AI providers with a BYOK (Bring Your Own Key) model. No vendor lock-in—use your existing API keys.
OpenAI Models
- • GPT-5 (default)
- • GPT-4o
- • o1-preview for complex reasoning
Anthropic Models
- • Claude Sonnet 4.5
- • Claude Opus 4
- • Automatic prompt caching
Cost Tracking
Every test run tracks token usage and estimates costs. Reports include:
Typical costs: ~$0.03-0.09 per test suite run. Prompt caching significantly reduces costs for repeated test executions.