Browser Use
Dex can interact with the browser directly, from simple tab management to full point-and-click automation via Chrome DevTools Protocol.
Overview
Dex has two layers of browser interaction. Basic tools handle common operations like opening URLs and reading page content. Full browser use mode enables point-and-click automation for anything that doesn't have a dedicated integration.
Basic Browser Tools
Always available during chat:
- Read page content: extract text or capture a screenshot from any tab.
- Navigate: open URLs in new tabs or batch-open multiple links, optionally in tab groups.
- Search history: search your browsing history with queries and time filters.
- Tab management: focus, close, group, and ungroup tabs.
Full Browser Use
When Dex needs to interact with a page that doesn't have an API integration, it enters browser use mode, a dedicated environment powered by Chrome DevTools Protocol (CDP).
In this mode, Dex can:
- Click: left, right, double, triple, and middle click at specific coordinates or on identified elements.
- Type: enter text into any focused input field.
- Scroll: scroll in any direction or scroll specific elements into view.
- Take screenshots: capture the page state to decide next actions.
- Keyboard shortcuts: press key combinations (Ctrl+A, Cmd+C, etc.).
- Drag and drop: move elements between positions.
- Navigate: go to URLs within the browser use session.
Browser use operates within a dedicated Chrome tab group. After every action, Dex captures a fresh screenshot to assess the result before deciding the next step.
When To Use
Dex prefers integrations over browser clicking whenever possible, they're faster and more reliable. Browser use mode kicks in when:
- The target app doesn't have a Dex integration (e.g., a custom internal tool).
- The task requires visual interaction (e.g., filling a form, clicking through a multi-step flow).
- You explicitly ask Dex to interact with a page.