Taking Action
Use act() to tell the agent what to do: Instructions provided to act can be high-level tasks:Chaining Acts
Combine multiple act calls to accomplish complex sequences of interactions:Providing Data
You can provide arbitrary data fields that the agent will use where appropriate during its actions:Custom Prompting
Provide custom system prompt instructions as needed:Navigating Directly
While the agent is capable of navigating to URLs on its own, you may sometimes want to navigate to a specific URL directly. To do this, usenav
:
Agent Capabilities
What can agent do in act?
The agent is capable of mouse, keyboard, and browser-specific actions, including but not limited to:- Clicking with the mouse
- Dragging with the mouse
- Typing long blocks of content
- Pressing specific keystrokes
- Switching tabs
- Navigating to URLs
What is the agent aware of?
The agent knows about and sees:- The current screenshot plus some past screenshots
- History of its own actions from the same
act()
- All currently open tabs
- Which tab is active