Manual Interaction

Manual interaction lets you take direct control of the browser session. Use this when the AI agent gets stuck, when handling sensitive credentials, or when you need precise control over specific actions.

Coordinate System

The browser viewport is 1024×600 pixels. The origin (0,0) is at the top-left. Add 155 pixels to your Y coordinate to account for browser chrome (address bar, tabs). Clickable area: 1024×445 pixels (after accounting for 155px chrome offset) If you’re displaying the stream at a different size, scale coordinates proportionally:

function scaleCoords(clientX, clientY, displayWidth, displayHeight) {
  const serverWidth = 1024;
  const serverHeight = 600;
  const yOffset = 155; // Browser chrome

  return {
    x: Math.round((clientX / displayWidth) * serverWidth),
    y: Math.round((clientY / displayHeight) * serverHeight) + yOffset
  };
}

Taking Control

takeOverControl

Pause the AI agent and enable manual mode:

curl -X POST https://connect.webrun.ai/start/send-message \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "sessionId": "SESSION_ID",
    "message": {
      "actionType": "interaction",
      "action": { "type": "takeOverControl" }
    }
  }'

The current task pauses automatically. Perform manual actions, then release control to let the agent continue.

releaseControl

Return control to the AI agent:

curl -X POST https://connect.webrun.ai/start/send-message \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "sessionId": "SESSION_ID",
    "message": {
      "actionType": "interaction",
      "action": { "type": "releaseControl" }
    }
  }'

Interaction Actions

Click

Click at specific coordinates:

{
  "actionType": "interaction",
  "action": {
    "type": "CLICK",
    "x": 500,      // 0-1024
    "y": 300       // 155-755 (with chrome offset)
  }
}

Type

Type text into the currently focused element:

{
  "actionType": "interaction",
  "action": {
    "type": "TYPE",
    "text": "Hello world",
    "humanLike": true  // Optional: simulate human typing speed
  }
}

Set humanLike: true to simulate realistic typing speed and patterns. This helps avoid detection on sites with anti-bot measures.

Key Press

Press a single key or key combination:

{
  "actionType": "interaction",
  "action": {
    "type": "KEY_PRESS",
    "key": "Enter"
  }
}

Common keys:

Enter, Escape, Tab, Backspace, Delete
ArrowUp, ArrowDown, ArrowLeft, ArrowRight
PageUp, PageDown, Home, End
Single characters: a, b, 1, 2, etc.

Common Patterns

// Take control
socket.emit("message", {
  actionType: "interaction",
  action: { type: "takeOverControl" }
});

// Click username field
socket.emit("message", {
  actionType: "interaction",
  action: { type: "CLICK", x: 400, y: 300 }
});

// Type username
socket.emit("message", {
  actionType: "interaction",
  action: { type: "TYPE", text: "[email protected]", humanLike: true }
});

// Tab to password field
socket.emit("message", {
  actionType: "interaction",
  action: { type: "KEY_PRESS", key: "Tab" }
});

// Type password
socket.emit("message", {
  actionType: "interaction",
  action: { type: "TYPE", text: "secretPassword123", humanLike: true }
});

// Submit
socket.emit("message", {
  actionType: "interaction",
  action: { type: "KEY_PRESS", key: "Enter" }
});

// Wait for 2FA prompt, user solves it manually via video stream

// Release control after 2FA is complete
socket.emit("message", {
  actionType: "interaction",
  action: { type: "releaseControl" }
});

// Open dropdown
socket.emit("message", {
  actionType: "interaction",
  action: { type: "CLICK", x: 500, y: 350 }
});

// Navigate with arrow keys
socket.emit("message", {
  actionType: "interaction",
  action: { type: "KEY_PRESS", key: "ArrowDown" }
});

socket.emit("message", {
  actionType: "interaction",
  action: { type: "KEY_PRESS", key: "ArrowDown" }
});

// Select
socket.emit("message", {
  actionType: "interaction",
  action: { type: "KEY_PRESS", key: "Enter" }
});

Responding to Guardrails

socket.on("message", (data) => {
  if (data.type === "guardrail_trigger" &&
      data.data.value.includes("CAPTCHA")) {

    // Take control for manual solving
    socket.emit("message", {
      actionType: "interaction",
      action: { type: "takeOverControl" }
    });

    // User solves CAPTCHA via video stream
    // Once done, release control and resume
    document.getElementById("captcha-solved").onclick = () => {
      socket.emit("message", {
        actionType: "interaction",
        action: { type: "releaseControl" }
      });

      socket.emit("message", {
        actionType: "guardrail",
        taskDetails: "CAPTCHA solved, continue",
        newState: "resume"
      });
    };
  }
});

Complete Example with Video Stream

import { io } from "socket.io-client";

// Create session
const session = await fetch("https://connect.webrun.ai/start/start-session", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": `Bearer ${API_KEY}`
  },
  body: JSON.stringify({
    taskDetails: "Go to login page",
    startingUrl: "https://example.com/login"
  })
}).then(r => r.json());

// Setup video stream
const iframe = document.createElement("iframe");
iframe.src = session.streaming.webViewURL;
iframe.width = 1024;
iframe.height = 600;
document.body.appendChild(iframe);

// Connect WebSocket
const socket = io("https://connect.webrun.ai", {
  auth: { sessionId: session.sessionId },
  transports: ["websocket"]
});

// Take control when ready
socket.on("connect", () => {
  socket.emit("message", {
    actionType: "interaction",
    action: { type: "takeOverControl" }
  });
});

// Setup click handler
const canvas = document.getElementById("clickable-overlay");
canvas.addEventListener("click", (e) => {
  const rect = canvas.getBoundingClientRect();
  const coords = scaleCoords(
    e.clientX - rect.left,
    e.clientY - rect.top,
    rect.width,
    rect.height
  );

  socket.emit("message", {
    actionType: "interaction",
    action: { type: "CLICK", x: coords.x, y: coords.y }
  });
});

// Release control when done
document.getElementById("done-button").addEventListener("click", () => {
  socket.emit("message", {
    actionType: "interaction",
    action: { type: "releaseControl" }
  });

  // Continue with next task
  socket.emit("message", {
    actionType: "newTask",
    newState: "start",
    taskDetails: "Complete the checkout process"
  });
});

function scaleCoords(clientX, clientY, displayWidth, displayHeight) {
  return {
    x: Math.round((clientX / displayWidth) * 1024),
    y: Math.round((clientY / displayHeight) * 600) + 155
  };
}

Troubleshooting

Issue	Solution
Clicks not registering	Verify Y offset (155px) is added to coordinates
Clicking wrong location	Check coordinate scaling function
Typing not working	Click element first to ensure it’s focused
Actions happening too fast	Add delays between actions (500ms recommended)

Video Streaming

Setup video streaming for visual feedback

Handling Guardrails

Respond when the agent needs help

Getting Started

Concepts

Usage Guides

Integrations

Profiles

API Reference

Capabilities

Troubleshooting

Coordinate System

Taking Control

takeOverControl

releaseControl

Interaction Actions

Click

Type

Key Press

Common Patterns

Responding to Guardrails

Complete Example with Video Stream

Troubleshooting

Video Streaming

Handling Guardrails

Getting Started

Concepts

Usage Guides

Integrations

Profiles

API Reference

Capabilities

Troubleshooting

​Coordinate System

​Taking Control

​takeOverControl

​releaseControl

​Interaction Actions

​Click

​Type

​Key Press

​Common Patterns

​Manual Login with 2FA

​Dropdown Navigation

​Responding to Guardrails

​Complete Example with Video Stream

​Troubleshooting

Video Streaming

Handling Guardrails

Coordinate System

Taking Control

takeOverControl

releaseControl

Interaction Actions

Click

Type

Key Press

Common Patterns

Manual Login with 2FA

Dropdown Navigation

Responding to Guardrails

Complete Example with Video Stream

Troubleshooting