openclaw/docs/refactor/browser-control-simplificat...

59 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

---
summary: "Refactor: simplify browser control API + implementation"
read_when:
- Refactoring browser control routes, client, or CLI
- Auditing agent-facing browser tool surface
date: 2025-12-20
---
# Refactor: Browser control simplification
Goal: make the browser-control surface **small, stable, and agent-oriented**, and remove “implementation-shaped” APIs (Playwright/CDP specifics, one-off endpoints, and debugging helpers).
## Why
- The previous API accreted many narrow endpoints (`/click`, `/type`, `/press`, …) plus debug utilities.
- Some actions are inherently racy when modeled as “do X *when* the event is already visible” (file chooser, dialogs).
- We want a single, coherent contract that keeps “how its implemented” private.
## Target contract (vNext)
**Basics**
- `GET /` status
- `POST /start`, `POST /stop`
- `GET /tabs`, `POST /tabs/open`, `POST /tabs/focus`, `DELETE /tabs/:targetId`
**Agent actions**
- `POST /navigate` `{ url, targetId? }`
- `POST /act` `{ kind, targetId?, ... }` where `kind` is one of:
- `click`, `type`, `press`, `hover`, `drag`, `select`, `fill`, `wait`, `resize`, `close`, `evaluate`
- `POST /screenshot` `{ targetId?, fullPage?, ref?, element?, type? }`
- `GET /snapshot` `?format=ai|aria&targetId?&limit?`
- `GET /console` `?level?&targetId?`
- `POST /pdf` `{ targetId? }`
**Hooks (pre-setup / arming)**
- `POST /hooks/file-chooser` `{ targetId?, paths, timeoutMs? }`
- `POST /hooks/dialog` `{ targetId?, accept, promptText?, timeoutMs? }`
Semantics:
- Hook endpoints **arm** the next matching event within `timeoutMs` (default 2 minutes, clamped to max 2 minutes).
- Last arm wins per page (new arm replaces previous).
## Work checklist
- [x] Replace action endpoints with `POST /act`
- [x] Remove legacy endpoints (`/click`, `/type`, `/wait`, …) and any CLI wrappers that no longer make sense
- [x] Remove `/back` and any history-specific routes
- [x] Convert `upload` + `dialog` to hook/arming endpoints
- [x] Unify screenshots behind `POST /screenshot` (no GET variant)
- [x] Trim inspect/debug endpoints (`/query`, `/dom`) unless explicitly needed
- [x] Update docs/browser.md to describe contract without implementation details
- [x] Update tests (server + client) to cover vNext contract
## Notes / decisions
- Keep Playwright as an internal implementation detail for now.
- Prefer ref-based interactions (`aria-ref`) over coordinate-based ones.
- Keep the code split “routes vs. engine” small and obvious; avoid scattering logic across too many files.