Image missing.
Show HN: Cua-Bench – a benchmark for AI agents in GUI environments

created: Jan. 26, 2026, 5:46 p.m. | updated: Jan. 28, 2026, 7:48 p.m.

Build, benchmark, and deploy agents that use computersCua is an open-source platform for building, benchmarking, and deploying agents that can use any computer, with isolated, self-hostable sandboxes (Docker, QEMU, Apple Virtualization). vibe-photoshop.mp4Choose Your PathCua - Agentic UI Automation & Code ExecutionBuild agents that see screens, click buttons, and complete tasks autonomously. Run isolated code execution environments for AI coding assistants like Claude Code, Codex CLI, or OpenCode. # Requires Python 3.12 or 3.13 from computer import Computer from agent import ComputerAgent computer = Computer ( os_type = "linux" , provider_type = "cloud" ) agent = ComputerAgent ( model = "anthropic/claude-sonnet-4-5-20250929" , computer = computer ) async for result in agent . Third-party components have their own licenses:Kasm (MIT)OmniParser (CC-BY-4.0)Optional cua-agent[omni] includes ultralytics (AGPL-3.0)TrademarksApple, macOS, Ubuntu, Canonical, and Microsoft are trademarks of their respective owners.

2 days, 12 hours ago: Hacker News