Browser Interaction

Taking Action

Use act() to tell the agent what to do:

Instructions provided to act can be high-level tasks:

await agent.act('log in to the app');

or low level actions:

await agent.act('click the submit button');

Think of it like you’re telling a coworker to do something. Breaking it up into tiny steps is unnecessary, but you want to be specific enough to make the objective clear.

Chaining Acts

Combine multiple act calls to accomplish complex sequences of interactions:

await agent.act('go to tasks page');
await agent.act('assign all pending tasks to Bob');
await agent.act('move all pending tasks to "In Progress"');

You can also chain multiple steps together in the same act call for convenience:

await agent.act([
    'go to tasks page',
    'assign all pending tasks to Bob',
    'move all pending tasks to "In Progress"'
]);

Providing Data

You can provide arbitrary data fields that the agent will use where appropriate during its actions:

await agent.act('create a new task', {
    data: {
        title: 'important task',
        description: 'some description'
    }
});

Custom Prompting

Provide custom system prompt instructions as needed:

await agent.act('create a new task', {
    prompt: 'tasks should be written in spanish'
});

Navigating Directly

While the agent is capable of navigating to URLs on its own, you may sometimes want to navigate to a specific URL directly.

To do this, use nav:

await agent.nav('https://google.com');

Agent Capabilities

What can agent do in act?

The agent is capable of mouse, keyboard, and browser-specific actions, including but not limited to:

Clicking with the mouse
Dragging with the mouse
Typing long blocks of content
Pressing specific keystrokes
Switching tabs
Navigating to URLs

What is the agent aware of?

The agent knows about and sees:

The current screenshot plus some past screenshots
History of its own actions from the same act()
All currently open tabs
Which tab is active

Getting Started

Core Concepts

Testing Web Apps

Advanced

Reference

Browser Interaction

Taking Action

Chaining Acts

Providing Data

Custom Prompting

Navigating Directly

Agent Capabilities

What can agent do in act?

What is the agent aware of?

Getting Started

Core Concepts

Testing Web Apps

Advanced

Reference

​Taking Action

​Chaining Acts

​Providing Data

​Custom Prompting

​Navigating Directly

​Agent Capabilities

​What can agent do in act?

​What is the agent aware of?

Taking Action

Chaining Acts

Providing Data

Custom Prompting

Navigating Directly

Agent Capabilities

What can agent do in act?

What is the agent aware of?