Initial project structure with Docker, agents, and test specs

- Docker base image with Xvfb + noVNC for headless GUI
- Agent base class for AI visual testing
- Test spec models and YAML parser
- CLI with run, test, build commands
- Example test spec for Development Hub

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
rob 2026-01-07 03:40:36 -04:00
parent 3521d44071
commit db1bad473a
15 changed files with 805 additions and 8 deletions

View File

@ -6,6 +6,52 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
**GhostQA** - AI-powered visual GUI testing via natural language **GhostQA** - AI-powered visual GUI testing via natural language
GhostQA enables testing of desktop GUI applications (PyQt, GTK, etc.) using AI vision agents. Instead of writing brittle UI test scripts, describe expected behavior in natural language and let AI verify it visually.
### How It Works
```
┌─────────────────────────────────────┐
│ Docker Container │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Target App │──▶│ Xvfb + noVNC │──┼──▶ Public URL (Cloudflare)
│ └─────────────┘ └──────────────┘ │
└─────────────────────────────────────┘
┌────────────▼────────────┐
│ AI Agent │
│ (ChatGPT, Claude, etc.) │
│ │
│ "Click Projects, │
│ verify 5 items shown" │
└─────────────────────────┘
```
### Core Components
1. **Docker Base Image** - Pre-configured with Xvfb, x11vnc, noVNC for headless GUI
2. **Test Runner** - Orchestrates containers and AI agent interactions
3. **Test Specs** - YAML/Markdown files describing tests in natural language
4. **Results Reporter** - Captures screenshots, AI observations, pass/fail status
### Example Test Spec
```yaml
name: Dashboard displays project list
app: development-hub
steps:
- action: wait_for_window
timeout: 10s
- action: verify
prompt: "Is there a list of projects visible on the left side?"
expected: true
- action: click
prompt: "Click on the first project in the list"
- action: verify
prompt: "Does the right panel show a dashboard with todos and goals?"
expected: true
```
## Development Commands ## Development Commands
```bash ```bash
@ -15,24 +61,59 @@ pip install -e ".[dev]"
# Run tests # Run tests
pytest pytest
# Run a single test # Build the base Docker image
pytest tests/test_file.py::test_name docker build -t ghostqa-base -f docker/Dockerfile.base .
# Run an app in test mode
ghostqa run --app development-hub --expose 6080
``` ```
## Architecture ## Architecture
*TODO: Describe the project architecture* ```
src/ghostqa/
├── __init__.py
├── __main__.py # CLI entry point
├── docker/
│ ├── base.py # Base image builder
│ └── runner.py # Container orchestration
├── agents/
│ ├── base.py # Abstract AI agent interface
│ ├── chatgpt.py # ChatGPT agent mode (via browser automation)
│ └── claude.py # Claude computer-use API
├── specs/
│ ├── parser.py # Parse test spec YAML/MD
│ └── models.py # Test spec data models
├── runner.py # Test execution engine
└── reporter.py # Results and screenshots
```
### Key Modules ### Key Modules
*TODO: List key modules and their purposes* | Module | Purpose |
|--------|---------|
| `docker.runner` | Build and manage Docker containers with noVNC |
| `agents.base` | Abstract interface for AI vision agents |
| `specs.parser` | Parse natural language test specifications |
| `runner` | Execute tests, coordinate agents and containers |
| `reporter` | Generate test reports with screenshots |
### Key Paths ### Key Paths
- **Source code**: `src/ghostqa/` - **Source code**: `src/ghostqa/`
- **Tests**: `tests/` - **Tests**: `tests/`
- **Docker files**: `docker/`
- **Example specs**: `examples/`
- **Documentation**: `docs/` (symlink to project-docs) - **Documentation**: `docs/` (symlink to project-docs)
## Supported AI Agents
| Agent | Method | Notes |
|-------|--------|-------|
| ChatGPT Agent Mode | Browser automation to chat.openai.com | Included in Plus/Pro subscription |
| Claude Computer Use | API with vision + actions | Per-token pricing |
| Open source (browser-use) | Local LLM or API | Flexible, self-hosted |
## Documentation ## Documentation
Documentation lives in `docs/` (symlink to centralized docs system). Documentation lives in `docs/` (symlink to centralized docs system).
@ -42,7 +123,4 @@ Documentation lives in `docs/` (symlink to centralized docs system).
Quick reference: Quick reference:
- Edit files in `docs/` folder - Edit files in `docs/` folder
- Use `public: true` frontmatter for public-facing docs - Use `public: true` frontmatter for public-facing docs
- Use `<!-- PRIVATE_START -->` / `<!-- PRIVATE_END -->` to hide sections
- Deploy: `~/PycharmProjects/project-docs/scripts/build-public-docs.sh ghostqa --deploy` - Deploy: `~/PycharmProjects/project-docs/scripts/build-public-docs.sh ghostqa --deploy`
Do NOT create documentation files directly in this repository.

62
docker/Dockerfile.base Normal file
View File

@ -0,0 +1,62 @@
# GhostQA Base Image
# Pre-configured headless display with noVNC for AI visual testing
FROM python:3.11-slim
# Install display and VNC dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
xvfb \
x11vnc \
novnc \
websockify \
# Qt/GUI dependencies
libxcb-cursor0 \
libxcb-icccm4 \
libxcb-image0 \
libxcb-keysyms1 \
libxcb-randr0 \
libxcb-render-util0 \
libxcb-shape0 \
libxcb-xfixes0 \
libxcb-xinerama0 \
libxcb-xkb1 \
libxkbcommon-x11-0 \
libegl1 \
libgl1 \
libfontconfig1 \
libdbus-1-3 \
# Utilities
procps \
net-tools \
&& rm -rf /var/lib/apt/lists/*
# Set up virtual display
ENV DISPLAY=:99
ENV QT_QPA_PLATFORM=xcb
# Create startup script
RUN echo '#!/bin/bash\n\
set -e\n\
\n\
# Start virtual framebuffer\n\
Xvfb :99 -screen 0 ${SCREEN_WIDTH:-1280}x${SCREEN_HEIGHT:-720}x24 &\n\
sleep 1\n\
\n\
# Start VNC server\n\
x11vnc -display :99 -forever -shared -rfbport 5900 -nopw &\n\
sleep 1\n\
\n\
# Start noVNC (web-based VNC client)\n\
websockify --web=/usr/share/novnc/ ${NOVNC_PORT:-6080} localhost:5900 &\n\
\n\
echo "GhostQA display ready on port ${NOVNC_PORT:-6080}"\n\
echo "Connect via browser: http://localhost:${NOVNC_PORT:-6080}/vnc.html"\n\
\n\
# Run the target application\n\
exec "$@"\n\
' > /usr/local/bin/ghostqa-start && chmod +x /usr/local/bin/ghostqa-start
EXPOSE 6080
ENTRYPOINT ["/usr/local/bin/ghostqa-start"]
CMD ["bash"]

View File

@ -0,0 +1,36 @@
# Example GhostQA test spec for Development Hub
name: Development Hub Dashboard
app: development-hub
description: Verify the Development Hub GUI loads and displays projects correctly
tags: [smoke, gui]
steps:
- name: Wait for app window
action: wait_for_window
timeout: 15s
- name: Verify project list visible
action: verify
prompt: "Is there a list of projects visible on the left side of the window?"
expected: true
- name: Count projects
action: verify
prompt: "How many projects are shown in the list? Just give me the number."
- name: Click first project
action: click
prompt: "Click on the first project in the list on the left side"
- name: Verify dashboard loaded
action: verify
prompt: "Does the right panel now show a dashboard with sections like 'GOALS', 'MILESTONES', or 'TODOs'?"
expected: true
- name: Check for todos section
action: verify
prompt: "Is there a 'TODOs' section visible with priority levels like 'HIGH PRIORITY'?"
expected: true
- name: Take final screenshot
action: screenshot

View File

@ -8,13 +8,25 @@ version = "0.1.0"
description = "AI-powered visual GUI testing via natural language" description = "AI-powered visual GUI testing via natural language"
readme = "README.md" readme = "README.md"
requires-python = ">=3.10" requires-python = ">=3.10"
dependencies = [] dependencies = [
"pyyaml>=6.0",
"httpx>=0.25",
]
[project.optional-dependencies] [project.optional-dependencies]
dev = [ dev = [
"pytest>=7.0", "pytest>=7.0",
"pytest-cov>=4.0", "pytest-cov>=4.0",
] ]
claude = [
"anthropic>=0.18",
]
browser-use = [
"browser-use>=0.1",
]
[project.scripts]
ghostqa = "ghostqa.__main__:main"
[tool.setuptools.packages.find] [tool.setuptools.packages.find]
where = ["src"] where = ["src"]

3
src/ghostqa/__init__.py Normal file
View File

@ -0,0 +1,3 @@
"""GhostQA - AI-powered visual GUI testing via natural language."""
__version__ = "0.1.0"

51
src/ghostqa/__main__.py Normal file
View File

@ -0,0 +1,51 @@
"""CLI entry point for GhostQA."""
import argparse
import sys
def main():
"""Main CLI entry point."""
parser = argparse.ArgumentParser(
prog="ghostqa",
description="AI-powered visual GUI testing via natural language",
)
subparsers = parser.add_subparsers(dest="command", help="Commands")
# ghostqa run - Run an app in test mode
run_parser = subparsers.add_parser("run", help="Run an app in test container")
run_parser.add_argument("--app", required=True, help="App to run (name or path)")
run_parser.add_argument("--port", type=int, default=6080, help="noVNC port (default: 6080)")
run_parser.add_argument("--build", action="store_true", help="Rebuild container image")
# ghostqa test - Run test specs
test_parser = subparsers.add_parser("test", help="Run test specifications")
test_parser.add_argument("spec", nargs="?", help="Test spec file (default: all in specs/)")
test_parser.add_argument("--agent", choices=["chatgpt", "claude", "browser-use"], default="claude")
test_parser.add_argument("--url", help="URL where app is exposed")
# ghostqa build - Build base image
build_parser = subparsers.add_parser("build", help="Build GhostQA base Docker image")
build_parser.add_argument("--no-cache", action="store_true", help="Build without cache")
args = parser.parse_args()
if args.command is None:
parser.print_help()
sys.exit(1)
if args.command == "run":
from ghostqa.docker.runner import run_app
run_app(args.app, args.port, rebuild=args.build)
elif args.command == "test":
from ghostqa.runner import run_tests
run_tests(args.spec, agent=args.agent, url=args.url)
elif args.command == "build":
from ghostqa.docker.base import build_base_image
build_base_image(no_cache=args.no_cache)
if __name__ == "__main__":
main()

View File

@ -0,0 +1,5 @@
"""AI agents for visual GUI testing."""
from ghostqa.agents.base import Agent
__all__ = ["Agent"]

100
src/ghostqa/agents/base.py Normal file
View File

@ -0,0 +1,100 @@
"""Base agent interface for AI visual testing."""
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any
@dataclass
class ActionResult:
"""Result of an agent action."""
success: bool
message: str
screenshot: bytes | None = None
observation: str = ""
raw_response: Any = None
class Agent(ABC):
"""Abstract base class for AI agents that can see and interact with GUIs."""
def __init__(self, url: str):
"""Initialize agent with target URL.
Args:
url: URL where the app's noVNC is exposed
"""
self.url = url
@abstractmethod
def connect(self) -> bool:
"""Connect to the target application.
Returns:
True if connection successful
"""
pass
@abstractmethod
def screenshot(self) -> bytes:
"""Take a screenshot of the current state.
Returns:
PNG image bytes
"""
pass
@abstractmethod
def verify(self, prompt: str) -> ActionResult:
"""Verify something about the current screen.
Args:
prompt: Natural language description of what to verify
Returns:
ActionResult with observation
"""
pass
@abstractmethod
def click(self, prompt: str) -> ActionResult:
"""Click on an element described in natural language.
Args:
prompt: Description of what to click (e.g., "the Save button")
Returns:
ActionResult indicating success/failure
"""
pass
@abstractmethod
def type_text(self, text: str, prompt: str | None = None) -> ActionResult:
"""Type text, optionally into a specific field.
Args:
text: Text to type
prompt: Optional description of where to type
Returns:
ActionResult indicating success/failure
"""
pass
@abstractmethod
def wait_for(self, prompt: str, timeout: float = 10.0) -> ActionResult:
"""Wait for a condition to be true.
Args:
prompt: Description of what to wait for
timeout: Maximum time to wait in seconds
Returns:
ActionResult indicating if condition was met
"""
pass
def disconnect(self):
"""Disconnect from the target application."""
pass

View File

@ -0,0 +1 @@
"""Docker container management for GhostQA."""

View File

@ -0,0 +1,32 @@
"""Build GhostQA base Docker image."""
import subprocess
from pathlib import Path
def build_base_image(no_cache: bool = False):
"""Build the GhostQA base Docker image.
Args:
no_cache: Build without using cache
"""
dockerfile = Path(__file__).parent.parent.parent.parent / "docker" / "Dockerfile.base"
if not dockerfile.exists():
# Try package location
import ghostqa
package_dir = Path(ghostqa.__file__).parent
dockerfile = package_dir.parent.parent / "docker" / "Dockerfile.base"
if not dockerfile.exists():
print(f"Error: Dockerfile.base not found")
return
cmd = ["docker", "build", "-t", "ghostqa-base", "-f", str(dockerfile), str(dockerfile.parent)]
if no_cache:
cmd.insert(2, "--no-cache")
print("Building GhostQA base image...")
subprocess.run(cmd, check=True)
print("Base image built successfully: ghostqa-base")

View File

@ -0,0 +1,81 @@
"""Docker container runner for GUI apps."""
import subprocess
from pathlib import Path
def run_app(app: str, port: int = 6080, rebuild: bool = False):
"""Run an application in a GhostQA container.
Args:
app: Application name or path to project
port: Port to expose noVNC on
rebuild: Whether to rebuild the container image
"""
# Resolve app path
if "/" in app or Path(app).exists():
app_path = Path(app).resolve()
else:
# Assume it's a project in PycharmProjects
app_path = Path.home() / "PycharmProjects" / app
if not app_path.exists():
print(f"Error: App not found at {app_path}")
return
app_name = app_path.name.lower().replace("-", "_").replace(" ", "_")
image_name = f"ghostqa-{app_name}"
# Check for Dockerfile in app, otherwise use base
dockerfile = app_path / "Dockerfile.ghostqa"
if not dockerfile.exists():
dockerfile = app_path / "Dockerfile"
if dockerfile.exists():
# Build app-specific image
if rebuild or not _image_exists(image_name):
print(f"Building image: {image_name}")
subprocess.run(
["docker", "build", "-t", image_name, "-f", str(dockerfile), str(app_path)],
check=True,
)
else:
# Use base image with app mounted
image_name = "ghostqa-base"
if not _image_exists(image_name):
print("Base image not found. Building...")
from ghostqa.docker.base import build_base_image
build_base_image()
# Run container
print(f"Starting {app_name} on port {port}...")
print(f"Connect via browser: http://localhost:{port}/vnc.html")
cmd = [
"docker", "run", "-it", "--rm",
"-p", f"{port}:6080",
"-v", f"{app_path}:/app",
"-w", "/app",
"-e", f"NOVNC_PORT=6080",
image_name,
]
# If using base image, add command to run the app
if not (app_path / "Dockerfile.ghostqa").exists() and not (app_path / "Dockerfile").exists():
# Try to detect how to run the app
if (app_path / "pyproject.toml").exists():
cmd.extend(["bash", "-c", "pip install -e . && python -m " + app_name])
else:
cmd.append("bash")
subprocess.run(cmd)
def _image_exists(name: str) -> bool:
"""Check if a Docker image exists."""
result = subprocess.run(
["docker", "images", "-q", name],
capture_output=True,
text=True,
)
return bool(result.stdout.strip())

217
src/ghostqa/runner.py Normal file
View File

@ -0,0 +1,217 @@
"""Test execution engine."""
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from ghostqa.specs.models import TestSpec, TestStep, ActionType
from ghostqa.specs.parser import parse_spec, find_specs
from ghostqa.agents.base import Agent, ActionResult
@dataclass
class StepResult:
"""Result of executing a single test step."""
step: TestStep
passed: bool
message: str
duration: float
screenshot: bytes | None = None
@dataclass
class TestResult:
"""Result of executing a complete test spec."""
spec: TestSpec
passed: bool
step_results: list[StepResult] = field(default_factory=list)
started_at: datetime = field(default_factory=datetime.now)
duration: float = 0.0
@property
def failed_steps(self) -> list[StepResult]:
"""Get list of failed steps."""
return [r for r in self.step_results if not r.passed]
class TestRunner:
"""Execute test specifications using an AI agent."""
def __init__(self, agent: Agent):
"""Initialize runner with an agent.
Args:
agent: AI agent to use for testing
"""
self.agent = agent
def run_spec(self, spec: TestSpec) -> TestResult:
"""Run a single test specification.
Args:
spec: Test specification to run
Returns:
TestResult with pass/fail status
"""
result = TestResult(spec=spec, passed=True)
start_time = datetime.now()
print(f"\nRunning: {spec.name}")
print(f" App: {spec.app}")
print(f" Steps: {spec.step_count}")
print()
for i, step in enumerate(spec.steps, 1):
step_name = step.name or f"Step {i}"
print(f" [{i}/{spec.step_count}] {step_name}...", end=" ", flush=True)
step_start = datetime.now()
step_result = self._run_step(step)
step_result.duration = (datetime.now() - step_start).total_seconds()
result.step_results.append(step_result)
if step_result.passed:
print(f"PASS ({step_result.duration:.1f}s)")
else:
print(f"FAIL ({step_result.duration:.1f}s)")
print(f" {step_result.message}")
result.passed = False
result.duration = (datetime.now() - start_time).total_seconds()
print()
if result.passed:
print(f" PASSED in {result.duration:.1f}s")
else:
print(f" FAILED ({len(result.failed_steps)} failures) in {result.duration:.1f}s")
return result
def _run_step(self, step: TestStep) -> StepResult:
"""Run a single test step.
Args:
step: Step to execute
Returns:
StepResult
"""
try:
if step.action == ActionType.WAIT_FOR_WINDOW:
action_result = self.agent.wait_for(
"The application window is fully loaded and visible",
timeout=step.timeout,
)
elif step.action == ActionType.VERIFY:
action_result = self.agent.verify(step.prompt)
# Check expected value if provided
if step.expected is not None:
if isinstance(step.expected, bool):
# Check if response indicates true/false
obs_lower = action_result.observation.lower()
is_positive = any(
word in obs_lower
for word in ["yes", "true", "correct", "visible", "shown", "present"]
)
is_negative = any(
word in obs_lower
for word in ["no", "false", "not", "cannot", "isn't", "hidden"]
)
if step.expected and not is_positive:
action_result.success = False
action_result.message = f"Expected true, got: {action_result.observation}"
elif not step.expected and not is_negative:
action_result.success = False
action_result.message = f"Expected false, got: {action_result.observation}"
elif step.action == ActionType.CLICK:
action_result = self.agent.click(step.prompt)
elif step.action == ActionType.TYPE:
action_result = self.agent.type_text(step.text, step.prompt)
elif step.action == ActionType.WAIT:
action_result = self.agent.wait_for(step.prompt, step.timeout)
elif step.action == ActionType.SCREENSHOT:
screenshot = self.agent.screenshot()
action_result = ActionResult(
success=True,
message="Screenshot captured",
screenshot=screenshot,
)
else:
action_result = ActionResult(
success=False,
message=f"Unknown action: {step.action}",
)
return StepResult(
step=step,
passed=action_result.success,
message=action_result.message,
duration=0.0,
screenshot=action_result.screenshot,
)
except Exception as e:
return StepResult(
step=step,
passed=False,
message=str(e),
duration=0.0,
)
def run_tests(spec_path: str | None, agent: str = "claude", url: str | None = None):
"""Run test specifications.
Args:
spec_path: Path to spec file or directory (None = examples/)
agent: Agent type to use
url: URL where app is exposed
"""
# Find specs
if spec_path is None:
spec_dir = Path(__file__).parent.parent.parent / "examples"
specs = find_specs(spec_dir)
elif Path(spec_path).is_dir():
specs = find_specs(spec_path)
else:
specs = [Path(spec_path)]
if not specs:
print("No test specs found")
return
print(f"Found {len(specs)} test spec(s)")
# Create agent
if url is None:
url = "http://localhost:6080"
if agent == "claude":
# TODO: Implement Claude agent
print("Claude agent not yet implemented")
return
elif agent == "chatgpt":
# TODO: Implement ChatGPT agent
print("ChatGPT agent not yet implemented")
return
else:
print(f"Unknown agent: {agent}")
return
# Run tests
# runner = TestRunner(agent_instance)
# for spec_file in specs:
# spec = parse_spec(spec_file)
# result = runner.run_spec(spec)

View File

@ -0,0 +1,6 @@
"""Test specification parsing and models."""
from ghostqa.specs.models import TestSpec, TestStep
from ghostqa.specs.parser import parse_spec
__all__ = ["TestSpec", "TestStep", "parse_spec"]

View File

@ -0,0 +1,44 @@
"""Test specification data models."""
from dataclasses import dataclass, field
from enum import Enum
from typing import Any
class ActionType(Enum):
"""Types of test actions."""
VERIFY = "verify"
CLICK = "click"
TYPE = "type"
WAIT = "wait"
WAIT_FOR_WINDOW = "wait_for_window"
SCREENSHOT = "screenshot"
@dataclass
class TestStep:
"""A single step in a test specification."""
action: ActionType
prompt: str = ""
text: str = "" # For type action
expected: Any = None # For verify action
timeout: float = 10.0
name: str = ""
@dataclass
class TestSpec:
"""A complete test specification."""
name: str
app: str
steps: list[TestStep] = field(default_factory=list)
description: str = ""
tags: list[str] = field(default_factory=list)
@property
def step_count(self) -> int:
"""Number of steps in this spec."""
return len(self.steps)

View File

@ -0,0 +1,69 @@
"""Parse test specification files."""
from pathlib import Path
import yaml
from ghostqa.specs.models import TestSpec, TestStep, ActionType
def parse_spec(path: str | Path) -> TestSpec:
"""Parse a test specification file.
Args:
path: Path to YAML spec file
Returns:
TestSpec instance
"""
path = Path(path)
with open(path) as f:
data = yaml.safe_load(f)
steps = []
for step_data in data.get("steps", []):
action_str = step_data.get("action", "verify")
action = ActionType(action_str)
# Parse timeout (handle "10s" format)
timeout = step_data.get("timeout", 10.0)
if isinstance(timeout, str) and timeout.endswith("s"):
timeout = float(timeout[:-1])
step = TestStep(
action=action,
prompt=step_data.get("prompt", ""),
text=step_data.get("text", ""),
expected=step_data.get("expected"),
timeout=timeout,
name=step_data.get("name", ""),
)
steps.append(step)
return TestSpec(
name=data.get("name", path.stem),
app=data.get("app", ""),
steps=steps,
description=data.get("description", ""),
tags=data.get("tags", []),
)
def find_specs(directory: str | Path) -> list[Path]:
"""Find all test spec files in a directory.
Args:
directory: Directory to search
Returns:
List of spec file paths
"""
directory = Path(directory)
specs = []
for pattern in ["*.yaml", "*.yml"]:
specs.extend(directory.glob(pattern))
specs.extend(directory.glob(f"**/{pattern}"))
return sorted(set(specs))