My QA Engineer is an LLM
Claude can click buttons.
That sounds trivial, but it changes how I build UIs. With Playwright MCP, Claude doesn’t just write code—it opens a browser, navigates to localhost, and verifies that things actually work. It catches bugs I’d miss in code review.
The Setup
Playwright MCP gives Claude browser automation. I run it with headless Chromium:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@anthropic-ai/mcp-server-playwright", "--headless"]
}
}
}
Now Claude can navigate, click, type, and screenshot. It sees what users see.
flowchart LR
Claude([Claude Code]) -->|browser_snapshot| PW[Playwright MCP]
PW --> Browser[Headless Chrome]
Browser --> App[localhost:1313]
Browser -.->|screenshot| Issue[(GitHub Issue)]
Issue -.->|visual history| PR[Pull Request]
flowchart TB
Claude([Claude Code]) -->|browser_snapshot| PW[Playwright MCP]
PW --> Browser[Headless Chrome]
Browser --> App[localhost:1313]
Browser -.->|screenshot| Issue[(GitHub Issue)]
Issue -.->|visual history| PR[Pull Request]
Flow: Claude Code → Playwright MCP → Browser → Screenshots → GitHub Issues → PR
The Viewport Matrix
Every UI change gets tested at three breakpoints:
| Viewport | Size | Why |
|---|---|---|
| Desktop | 1440×900 | Standard laptop |
| Tablet | 1024×768 | iPad portrait |
| Mobile | 390×844 | iPhone 14 |
Claude resizes the browser, takes a screenshot, uploads it to the GitHub issue. Three screenshots per state change. If the mobile nav is broken, I see it immediately—not after deployment.
Theme Testing
Same pattern for light and dark mode:
// Claude toggles the theme
await page.click('[data-testid="theme-toggle"]');
// Screenshots both states
The CSS variable consistency matters. A component that looks good in light mode might have invisible text in dark mode. Claude catches this by actually looking at both.
Interactive Testing
This is where it gets interesting. Claude doesn’t just screenshot—it uses the UI:
- Clicks every nav link, verifies destination
- Opens the mobile drawer, checks it animates
- Fills out forms, submits them
- Hovers over interactive elements
- Closes modals by clicking the backdrop
When it finds a bug—say, the drawer doesn’t close when you tap outside—it reports it, fixes the code, and re-tests. The feedback loop is tight.
The GitHub Issue as Test Report
Every screenshot gets uploaded to the GitHub issue. The issue body is rebuilt at each milestone:
## Current State
### Desktop (1440×900)

### Tablet (1024×768)

### Mobile (390×844)

## Testing Checklist
- [x] Navigation links work
- [x] Mobile drawer opens/closes
- [x] Theme toggle persists
- [ ] Form validation shows errors
The issue becomes a living test report. Anyone reviewing the PR can see exactly what was tested.
The Full Loop
A typical feature goes through this cycle:
- Issue created: “Add sidebar navigation”
- Claude implements: Writes the component
- Visual verification: Screenshots all viewports
- Interactive testing: Clicks through the UI
- Bug found: Drawer doesn’t close on backdrop click
- Fix applied: Updates the click handler
- Re-test: Fresh screenshots, all passing
- PR created: Links to issue with full visual history
No manual QA step. No “looks good to me” without actually looking. Claude verifies its own work.
What This Catches
Real bugs I’ve seen Claude find:
- Mobile nav hidden behind content (z-index issue)
- Button text invisible in dark mode (wrong CSS variable)
- Form submitting twice on Enter key
- Dropdown menu clipped by overflow:hidden parent
- Hover state stuck on touch devices
These are the bugs that slip through code review because they only appear in the browser. Claude sees them because it actually runs the code.
Every PR now comes with proof that it works.