- You can update
RUNTIME_OLLAMA_WARMUP_MODELS
to match the specific Ollama models you want to enable for your Helix install, see available values. - Helix will download the weights for models specified in
RUNTIME_OLLAMA_WARMUP_MODELS
at startup if they are not baked into the image. This can be slow, especially if it runs in parallel across many runners, and can easily saturate your network connection. This is why using the images with pre-baked weights (-small
and -large
variants) is recommended. - Warning: the
-large
image is large (over 100GB), but it saves you re-downloading the weights every time the container restarts! We recommend using X.Y.Z-small
and setting the RUNTIME_OLLAMA_WARMUP_MODELS
value to llama3:instruct,phi3:instruct
to get started, so the download isn’t too big. If you want to use other models in the Helix UI and API, delete this -e RUNTIME_OLLAMA_WARMUP_MODELS
line from below, and it will use the defaults (all models). The default models will take a long time to download! - Update
<GPU_MEMORY>
to correspond to how much GPU memory you have, e.g. “80GB” or “24GB” - You can add
--gpus 1
before the image name to target a specific GPU on the system (starting at 0). If you want to use multiple GPUs on a node, you’ll need to run multiple runner containers (in that case, remember to give them different names) - Make sure to run the container with
--restart always
or equivalent in your container runtime, since the runner will exit if it detects an unrecoverable error and should be restarted automatically - If you want to run the runner on the same machine as the controlplane, either: (a) set
--network host
and set --api-host http://localhost:8080
so that the runner can connect on localhost via the exposed port, or (b) use --api-host http://172.17.0.1:8080
so that the runner can connect to the API server via the docker bridge IP. On Windows or Mac, you can use --api-host http://host.docker.internal:8080
- Helix will currently also download and run SDXL and Mistral-7B weights used for fine-tuning at startup. These weights are not currently pre-baked anywhere. This can be disabled with
RUNTIME_AXOLOTL_ENABLED=false
if desired. If running in a low-memory environment, this may cause CUDA OOM errors at startup, which can be ignored (at startup) since the scheduler will only fit models into available memory after the startup phase. - If you want to use text fine-tuning, you need to set the environment variable
HF_TOKEN
to a valid Huggingface token, then you now need to accept sharing your contact information with Mistral here and then fetch an access token from here and then specify it in this environment variable.