Modulated capture
The agent picks resolution, quality and color per region — a cheap grayscale glance across the whole screen, a crisp 1× color crop only where the decision actually lives.
Your AI agent's eyes, 89–93% cheaper. Verdesk sends agents modulated visual deltas instead of full screenshots — same result, a fraction of the vision tokens.
Grab the setup from GitHub Releases — 198 MB, single .exe, WebView2 bundled offline.
Run Verdesk_x.y.z_x64-setup.exe. No admin needed, no telemetry. The tray icon lands quietly in the system tray.
Tray icon → Settings → Acceso. One button copies the MCP server config to your clipboard — paste it into your AI client.
Close and reopen your terminal (Claude Code, etc.) so it picks up the new MCP server. Done — your agent now sees deltas, not screenshots.
Today every vision agent — Computer Use, Copilot Vision, Browser Use — ships a full screenshot on every turn. The model pays, again and again, to re-read a navbar it has already seen forty times.
Verdesk watches the screen on a fixed grid and hands the agent only the cells that actually changed. One full capture, then just the deltas — that is where the saving comes from.
full frame × 4 turns — 100% vision cost
1 full, then only what changed — 89–93% fewer vision tokens on a typical session
The agent picks resolution, quality and color per region — a cheap grayscale glance across the whole screen, a crisp 1× color crop only where the decision actually lives.
A fixed 12×8 grid + perceptual hashing tags every cell new, changed or unchanged. Only the deltas travel — no AI in the loop.
Verdesk operates any Windows app for the agent: click, type, scroll, fire key combos — Win32, WPF, UWP, Electron, browsers. It acts semantically through the accessibility tree, and falls back to coordinates and on-screen text when an app exposes nothing — even games and canvas UIs. Reactive, too: wait for a control to appear or change before the next step.
list_uia_elementsact_uiaclick_text
click_atsend_keyspress_key
scrollwait_for_uia
Verdesk keeps a visual model for the agent — inspect it, refine a cell, query by hash. Every reset archives to a navigable history.
Around 40 versioned tools over the Model Context Protocol. Verdesk assumes nothing about the model that drives it — Claude, GPT, a local model, any MCP client.
capturelookrefine_cell
query_buffercompare_cellsfocus_zone
get_buffer_stateget_capabilities
The agent pulls plain text straight from any region of the screen — no vision tokens spent just to read.
In Pro, Verdesk binds to LAN or runs through an SSH tunnel — keys you generate, no SaaS broker. Tailscale optional for CGNAT.
The complete Verdesk vision layer, free for personal and non-commercial use.
Yearly subscription, one developer. Everything in Free, plus what you need to ship.
run_command toolYearly subscription, up to 5 developers. Shared license keys and centralized billing.