My QA Engineer is an LLM

Claude can click buttons.

That sounds trivial, but it changes how I build UIs. With Playwright MCP, Claude doesn’t just write code—it opens a browser, navigates to localhost, and verifies that things actually work. It catches bugs I’d miss in code review.

The Setup

Playwright MCP gives Claude browser automation. I run it with headless Chromium:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@anthropic-ai/mcp-server-playwright", "--headless"]
    }
  }
}

Now Claude can navigate, click, type, and screenshot. It sees what users see.

flowchart LR
    Claude([Claude Code]) -->|browser_snapshot| PW[Playwright MCP]
    PW --> Browser[Headless Chrome]
    Browser --> App[localhost:1313]
    Browser -.->|screenshot| Issue[(GitHub Issue)]
    Issue -.->|visual history| PR[Pull Request]
flowchart TB
    Claude([Claude Code]) -->|browser_snapshot| PW[Playwright MCP]
    PW --> Browser[Headless Chrome]
    Browser --> App[localhost:1313]
    Browser -.->|screenshot| Issue[(GitHub Issue)]
    Issue -.->|visual history| PR[Pull Request]

Flow: Claude Code → Playwright MCP → Browser → Screenshots → GitHub Issues → PR

The Viewport Matrix

Every UI change gets tested at three breakpoints:

ViewportSizeWhy
Desktop1440×900Standard laptop
Tablet1024×768iPad portrait
Mobile390×844iPhone 14

Claude resizes the browser, takes a screenshot, uploads it to the GitHub issue. Three screenshots per state change. If the mobile nav is broken, I see it immediately—not after deployment.

Theme Testing

Same pattern for light and dark mode:

// Claude toggles the theme
await page.click('[data-testid="theme-toggle"]');
// Screenshots both states

The CSS variable consistency matters. A component that looks good in light mode might have invisible text in dark mode. Claude catches this by actually looking at both.

Interactive Testing

This is where it gets interesting. Claude doesn’t just screenshot—it uses the UI:

When it finds a bug—say, the drawer doesn’t close when you tap outside—it reports it, fixes the code, and re-tests. The feedback loop is tight.

The GitHub Issue as Test Report

Every screenshot gets uploaded to the GitHub issue. The issue body is rebuilt at each milestone:

## Current State

### Desktop (1440×900)
![desktop](https://user-images.githubusercontent.com/...)

### Tablet (1024×768)
![tablet](https://user-images.githubusercontent.com/...)

### Mobile (390×844)
![mobile](https://user-images.githubusercontent.com/...)

## Testing Checklist
- [x] Navigation links work
- [x] Mobile drawer opens/closes
- [x] Theme toggle persists
- [ ] Form validation shows errors

The issue becomes a living test report. Anyone reviewing the PR can see exactly what was tested.

The Full Loop

A typical feature goes through this cycle:

  1. Issue created: “Add sidebar navigation”
  2. Claude implements: Writes the component
  3. Visual verification: Screenshots all viewports
  4. Interactive testing: Clicks through the UI
  5. Bug found: Drawer doesn’t close on backdrop click
  6. Fix applied: Updates the click handler
  7. Re-test: Fresh screenshots, all passing
  8. PR created: Links to issue with full visual history

No manual QA step. No “looks good to me” without actually looking. Claude verifies its own work.

What This Catches

Real bugs I’ve seen Claude find:

These are the bugs that slip through code review because they only appear in the browser. Claude sees them because it actually runs the code.

Every PR now comes with proof that it works.