Coordinate System
The browser viewport is 1024×600 pixels. The origin (0,0) is at the top-left. Add 155 pixels to your Y coordinate to account for browser chrome (address bar, tabs). Clickable area: 1024×445 pixels (after accounting for 155px chrome offset) If you’re displaying the stream at a different size, scale coordinates proportionally:Taking Control
takeOverControl
Pause the AI agent and enable manual mode:releaseControl
Return control to the AI agent:Interaction Actions
Click
Click at specific coordinates:Type
Type text into the currently focused element:humanLike: true to simulate realistic typing speed and patterns. This helps avoid detection on sites with anti-bot measures.
Key Press
Press a single key or key combination:Enter,Escape,Tab,Backspace,DeleteArrowUp,ArrowDown,ArrowLeft,ArrowRightPageUp,PageDown,Home,End- Single characters:
a,b,1,2, etc.
Common Patterns
Manual Login with 2FA
Dropdown Navigation
Responding to Guardrails
Complete Example with Video Stream
Troubleshooting
| Issue | Solution |
|---|---|
| Clicks not registering | Verify Y offset (155px) is added to coordinates |
| Clicking wrong location | Check coordinate scaling function |
| Typing not working | Click element first to ensure it’s focused |
| Actions happening too fast | Add delays between actions (500ms recommended) |
Video Streaming
Setup video streaming for visual feedback
Handling Guardrails
Respond when the agent needs help