Installing Helix on Kubernetes with SUSE Rancher and RKE2
This guide describes how to install Helix on Kubernetes using SUSE Rancher and RKE2.
Requirements
Control Plane
The Helix Control Plane provides the API, web interface, and postgres database and requires:
- Linux, macOS or Windows
- Docker
- 4 CPUs, 8GB RAM and 50GB+ free disk space
Inference Provider requires ONE OF:
- An NVIDIA GPU if you want to use private Helix Runners (example), or
- Ollama running locally on macOS, Linux or Windows (example), or
- An OpenAI-compatible API provider, such as TogetherAI (example) - we like TogetherAI because you can run the same open source models via their API that you can run locally using Helix GPU Runners, but you can use any OpenAI-compatible API (e.g. vLLM, Azure OpenAI, Gemini etc)
Private Helix Runners require:
- As much system memory as you have GPU memory
- Min 8GB GPU for small models (Llama3-8B, Phi3-Mini), 24GB for Mixtral/SDXL, 40GB for Llama3-70B
- Min 24GB GPU for fine-tuning (text or image)
- Recommend 2x24GB GPUs for e.g. text & image inference in parallel
- NVIDIA 3090s, A6000s are typically good price/performance
- 150GB+ of free disk space
- A fast internet connection (small runner image is 23GB)
Prerequisites
- SUSE Rancher installed and configured
- RKE2 Kubernetes cluster deployed via SUSE Rancher (follow the official SUSE Rancher documentation to set up an RKE2 cluster)
- Helm CLI installed on your workstation
kubectl
configured to access your RKE2 cluster
Note: This guide has been validated with SUSE Rancher v2.7.5+ and RKE2 v1.25.x+. Using earlier versions may result in unexpected behavior.
Deploying the Control Plane
This section details how to install the Helix control plane on an RKE2 cluster.
1. Install Keycloak
Helix uses Keycloak for authentication. If you have one already, you can skip this step. Otherwise, install one through Helm:
helm upgrade --install keycloak oci://registry-1.docker.io/bitnamicharts/keycloak \
--version "24.3.1" \
--set image.tag="23.0.7" \
--set auth.adminUser=admin \
--set auth.adminPassword=oh-hallo-insecure-password \
--set httpRelativePath="/auth/"
Note: These are pinned versions that have been tested and are known to work. Newer versions may work, but they have not been tested with Helix.
You do not need to expose a service to access Keycloak from outside the cluster - it is used as an internal implementation detail of Helix (and Helix manages the helix Keycloak realm via admin access).
2. Configure NVIDIA GPU Operator (if using GPU nodes)
If you’re planning to use GPU nodes for Helix Runners, install the NVIDIA GPU Operator on your RKE2 cluster:
# Add the NVIDIA Helm repository
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
# Install the GPU Operator
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator
3. Install the Helix Helm Repository
helm repo add helix https://charts.helix.ml
helm repo update
4. Apply the Helix Control Plane Chart
Download the example values file to configure the Helix control plane:
curl -o values-example.yaml https://raw.githubusercontent.com/helixml/helix/main/charts/helix-controlplane/values-example.yaml
Edit the values-example.yaml
file to configure providers and other settings for your RKE2 environment. At minimum, you must edit the provider configuration so that Helix can run.
Now install the control plane Helm chart with the latest images:
export LATEST_RELEASE=$(curl -s https://get.helix.ml/latest.txt)
helm upgrade --install my-helix-controlplane helix/helix-controlplane \
-f values-example.yaml \
--set image.tag="${LATEST_RELEASE}"
5. Configure Ingress with SUSE Rancher
RKE2 includes Traefik as the default ingress controller. To expose the Helix Control Plane using Traefik:
Add the following to your values-example.yaml
file:
ingress:
enabled: true
className: "traefik"
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web, websecure
hosts:
- host: helix.your-domain.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: helix-tls
hosts:
- helix.your-domain.com
Alternatively, you can use the SUSE Rancher UI to configure the ingress:
- Navigate to your cluster in the SUSE Rancher UI
- Go to Service Discovery > Ingresses
- Click Create
- Configure the ingress pointing to the
my-helix-controlplane
service on port 80 - Set up TLS if needed
Deploying a Runner
This section describes how to install a Helix runner on your RKE2 cluster.
1. Ensure your GPU nodes are properly labeled
If using GPU nodes for runners, ensure they have the correct labels:
kubectl label nodes <your-gpu-node> accelerator=nvidia
2. Install the Helix Runner Chart
export LATEST_RELEASE=$(curl -s https://get.helix.ml/latest.txt)
helm upgrade --install my-helix-runner helix/helix-runner \
--set runner.host="my-helix-controlplane" \
--set runner.token="oh-hallo-insecure-token" \
--set runner.memory=24GB \
--set replicaCount=1 \
--set image.tag="${LATEST_RELEASE}-small" \
--set nodeSelector.accelerator=nvidia
3. Verify Runner Connection
Check the runner logs to ensure it connects to the control plane:
kubectl logs -l app.kubernetes.io/name=helix-runner
Monitoring with SUSE Rancher
SUSE Rancher provides built-in monitoring tools that can be enabled for your RKE2 cluster:
- In the SUSE Rancher UI, navigate to your cluster
- Go to Apps
- Install the Monitoring app
- Once installed, you can access Grafana and Prometheus dashboards from the SUSE Rancher UI
Troubleshooting
Common Issues
GPU not detected by runner pods:
- Verify GPU operator installation:
kubectl get pods -n gpu-operator
- Check NVIDIA driver pods are running:
kubectl get pods -n gpu-operator-resources
- Validate device plugin:
kubectl describe node <gpu-node> | grep -i nvidia
- Verify GPU operator installation:
Control plane unable to reach Keycloak:
- Check Keycloak pods:
kubectl get pods | grep keycloak
- Examine Keycloak logs:
kubectl logs <keycloak-pod-name>
- Check Keycloak pods:
Runner can’t connect to control plane:
- Ensure the runner.host is correctly set to the control plane service name
- Verify the runner token matches what’s configured in the control plane
For more detailed assistance, check the Helix logs or contact Helix support.