Is visually grounded, meaning it understands precise coordinates in an image to interact with the browser accurately.
Very few LLMs meet this criteria, which is why we recommend Claude Sonnet 4, which has strong reasoning abilities and is grounded.
To use Sonnet, simply set ANTHROPIC_API_KEY in your environment.
You can also choose a specific Claude model by configuring an Anthropic or Bedrock client.
Most LLMs are NOT grounded, for example models from OpenAI, Gemini, or Llama.
For instructions on configuring LLMs with various providers, see LLM Providers.
Other visually grounded models in the 32B-72B parameter range may be appropriate for Magnitude, depending on the LLM and your test case complexity. Some of these include:
These models are mostly untested with Magnitude, they may not be suitable for running tests. If any of these LLMs are struggling to follow instructions or have issues with accuracy, please try a recommended model instead.