My QA Engineer is an LLM

2026-01-09 · 3 min read

Claude can click buttons.

That sounds trivial, but it changes how I build UIs. With Playwright MCP, Claude doesn’t just write code—it opens a browser, navigates to localhost, and verifies that things actually work. It catches bugs I’d miss in code review.

The Setup

Playwright MCP gives Claude browser automation. I run it with headless Chromium:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@anthropic-ai/mcp-server-playwright", "--headless"]
    }
  }
}

Now Claude can navigate, click, type, and screenshot. It sees what users see.

flowchart LR
    Claude([Claude Code]) -->|browser_snapshot| PW[Playwright MCP]
    PW --> Browser[Headless Chrome]
    Browser --> App[localhost:1313]
    Browser -.->|screenshot| Issue[(GitHub Issue)]
    Issue -.->|visual history| PR[Pull Request]

flowchart TB
    Claude([Claude Code]) -->|browser_snapshot| PW[Playwright MCP]
    PW --> Browser[Headless Chrome]
    Browser --> App[localhost:1313]
    Browser -.->|screenshot| Issue[(GitHub Issue)]
    Issue -.->|visual history| PR[Pull Request]

Flow: Claude Code → Playwright MCP → Browser → Screenshots → GitHub Issues → PR

The Viewport Matrix

Every UI change gets tested at three breakpoints:

Viewport	Size	Why
Desktop	1440×900	Standard laptop
Tablet	1024×768	iPad portrait
Mobile	390×844	iPhone 14

Claude resizes the browser, takes a screenshot, uploads it to the GitHub issue. Three screenshots per state change. If the mobile nav is broken, I see it immediately—not after deployment.

Theme Testing

Same pattern for light and dark mode:

// Claude toggles the theme
await page.click('[data-testid="theme-toggle"]');
// Screenshots both states

The CSS variable consistency matters. A component that looks good in light mode might have invisible text in dark mode. Claude catches this by actually looking at both.

Interactive Testing

This is where it gets interesting. Claude doesn’t just screenshot—it uses the UI:

Clicks every nav link, verifies destination
Opens the mobile drawer, checks it animates
Fills out forms, submits them
Hovers over interactive elements
Closes modals by clicking the backdrop

When it finds a bug—say, the drawer doesn’t close when you tap outside—it reports it, fixes the code, and re-tests. The feedback loop is tight.

The GitHub Issue as Test Report

Every screenshot gets uploaded to the GitHub issue. The issue body is rebuilt at each milestone:

## Current State

### Desktop (1440×900)
![desktop](https://user-images.githubusercontent.com/...)

### Tablet (1024×768)
![tablet](https://user-images.githubusercontent.com/...)

### Mobile (390×844)
![mobile](https://user-images.githubusercontent.com/...)

## Testing Checklist
- [x] Navigation links work
- [x] Mobile drawer opens/closes
- [x] Theme toggle persists
- [ ] Form validation shows errors

The issue becomes a living test report. Anyone reviewing the PR can see exactly what was tested.

The Full Loop

A typical feature goes through this cycle:

Issue created: “Add sidebar navigation”
Claude implements: Writes the component
Visual verification: Screenshots all viewports
Interactive testing: Clicks through the UI
Bug found: Drawer doesn’t close on backdrop click
Fix applied: Updates the click handler
Re-test: Fresh screenshots, all passing
PR created: Links to issue with full visual history

No manual QA step. No “looks good to me” without actually looking. Claude verifies its own work.

What This Catches

Real bugs I’ve seen Claude find:

Mobile nav hidden behind content (z-index issue)
Button text invisible in dark mode (wrong CSS variable)
Form submitting twice on Enter key
Dropdown menu clipped by overflow:hidden parent
Hover state stuck on touch devices

These are the bugs that slip through code review because they only appear in the browser. Claude sees them because it actually runs the code.

Every PR now comes with proof that it works.