Self-hosting Moondream

If you’d rather run Moondream yourself than use their cloud offering, there are many ways to do so depending on your needs:

💻 Run locally on the same machine where you’re running Magnitude tests
- Great for local development on high-end machines
🔒 Self-host on your own infrastucture or VPC
- When compliance and security are essential
⚡ Self-host with GPUs on a cloud platform like Modal
- Perfect for quickly booting up a fast Moondream server

Select the appropriate option below for more instructions!

Running Moondream locally

To run Moondream on your local machine, first go to https://moondream.ai/c/moondream-server and download the appropriate server (currently Moondream only provides macOS/Linux executables).

Follow the instructions on that page to run the executable.

Then you can configure magnitude.config.ts with the local base URL:

import { type MagnitudeConfig } from 'magnitude-test';

export default {
    url: "http://localhost:5173",
    executor: {
        provider: 'moondream',
        options: {
            baseUrl: 'http://localhost:2020/v1'
        }
    }
} satisfies MagnitudeConfig;

That’s it! Now you can run tests with local Moondream.

Local inference is very slow on CPU, newer macOS machines or a GPU is recommended for this, otherwise you may want to deploy on modal instead

Running Moondream locally

To run Moondream on your local machine, first go to https://moondream.ai/c/moondream-server and download the appropriate server (currently Moondream only provides macOS/Linux executables).

Follow the instructions on that page to run the executable.

Then you can configure magnitude.config.ts with the local base URL:

import { type MagnitudeConfig } from 'magnitude-test';

export default {
    url: "http://localhost:5173",
    executor: {
        provider: 'moondream',
        options: {
            baseUrl: 'http://localhost:2020/v1'
        }
    }
} satisfies MagnitudeConfig;

That’s it! Now you can run tests with local Moondream.

Local inference is very slow on CPU, newer macOS machines or a GPU is recommended for this, otherwise you may want to deploy on modal instead

Hosting in a VPC

If your company has strict requirements around where infrastructure needs to be hosted, you may need to host Moondream inside your company’s VPC on your cloud provider of choice.

The specific setup required will depend on the cloud provider.

However generally speaking these are the requirements:

A compute instance in your VPC with a GPU
A linux distribution with CUDA configured on the compute instance
Run the Moondream server executable on this instance
Ensure that the instance is accessible via port 2020 to your CI runners or dev machines, and configure test runners to connect to the appropriate URL:

import { type MagnitudeConfig } from 'magnitude-test';

export default {
    url: "http://localhost:5173",
    executor: {
        provider: 'moondream',
        options: {
            baseUrl: 'https://some-host-in-my-vpc/v1'
        }
    }
} satisfies MagnitudeConfig;

If you are an enterprise trying to configure Magnitude in your company, feel free to reach out to us directly at founders@magnitude.run.

Probably the easiest way to deploy Moondream is by hosting it on Modal, so we provide a deployment script to do this.

Create an account at modal.com
Run pip install modal to install the modal Python package
Run modal setup to authenticate (if this doesn’t work, try python -m modal setup)

More info: https://modal.com/docs/guide

Modal gives $30 in free credits per month.

Deploy Moondream

Clone Magnitude and run the moondream.py deploy script to automatically download the Moondream server and model weights to a Modal volume and deploy the endpoint:

git clone https://github.com/magnitudedev/magnitude.git
cd magnitude/infra
modal deploy moondream.py

Your Moondream endpoint will then be available at https://<your-modal-username>--moondream.modal.run/v1.

Then you can configure magnitude.config.ts with this base URL and start running tests powered by your newly deployed Moondream server!

import { type MagnitudeConfig } from 'magnitude-test';

export default {
    url: "http://localhost:5173",
    executor: {
        provider: 'moondream',
        options: {
            baseUrl: 'https://<your-modal-username>--moondream.modal.run/v1'
        }
    }
} satisfies MagnitudeConfig;

Keep in mind there may be some cold start times since Modal automatically scales down containers when not in use.

This Modal server will be publically accessible. We are working on a better way to make it configurable with a custom API key.

Customizing Deployment

You may want to customize the modal deployment depending on your needs.

You can modify the moondream.py deployment script to fit your needs.

Common options you may want to change:

gpu: GPU configuration. See Modal’s pricing page for details on different available GPUs and their cost. Also see comparison below.
scaledown_window: Time in seconds a container will wait to shut down after receiving no requests. If higher, will let you run tests after longer without needing the container to cold-start again.
min_containers: By setting this option you force Modal to keep some number of containers open to handle requests. This means Modal will also bill you for those containers all the time, but eliminates cold-starts.

GPU Comparison

Here’s a breakdown of how quickly different GPUs available on Modal are able to handle typical requests that Magnitude makes to Moondream:

GPU	Approximate inference time per action	Modal cost per hour
H100	~200ms	$3.95
A100 (40GB)	~300ms	$2.10
A10G	~500ms	$1.10
T4	~800ms	$0.59

Since Magnitude needs to wait a bit for the page to stabilize anyway, probably something like the A10G is a good price/performance balance, but any of these work well!

Contributing

​Running Moondream locally

​Running Moondream locally

​Hosting in a VPC