Installing Helix on GKE with Helm

The following demonstrates how to deploy Helix on GKE using Terraform and Helm.

Prerequisites

  • Install terraform
brew install terraform
  • Install google-cloud-sdk to get the gcloud CLI
brew install --cask google-cloud-sdk

Setup

  1. Clone this repository and cd into the directory:
git clone https://github.com/helixml/terraform-gke-helix.git
cd terraform-gke-helix
  1. Log into GCP:
gcloud init
gcloud auth application-default login
  1. Edit the configuration in the terraform.tfvars file to match your account.
  2. Initialize the Terraform workspace:
terraform init

Provision

Now deploy the infra.

terraform apply

Configure Kubectl

gcloud container clusters get-credentials $(terraform output -raw kubernetes_cluster_name) --region $(terraform output -raw region) --project $(terraform output -raw project_id)

You may need to install gke-gcloud-auth-plugin to gain access to the cluster.

Install Helix

Now you’re ready to install Helix.

1. Install Keycloak

Helix uses Keycloak for authentication. If you have one already, you can skip this step. Otherwise, to install one through Helm (chart info, repo).

For example:

helm upgrade --install keycloak oci://registry-1.docker.io/bitnamicharts/keycloak \
  --set auth.adminUser=admin \
  --set auth.adminPassword=oh-hallo-insecure-password \
  --set httpRelativePath="/auth/" \
  --set image.tag="23"

By default it only has ClusterIP service, in order to expose it, you can either port-forward or create a load balancer to access it if you are on k3s or minikube:

kubectl expose pod keycloak-0 --port 8888 --target-port 8080 --name keycloak-ext --type=LoadBalancer

2. Install the Helm Repository

helm repo add helix https://charts.helix.ml 
helm repo update

3. Apply the Chart

Copy the values-example.yaml from the repository to configure the Helix control plane. You can look at the configuration documentation to learn more about what they do.

curl -o values-example.yaml https://raw.githubusercontent.com/helixml/helix/main/charts/helix-controlplane/values-example.yaml

You must edit the provider configuration in this file so that Helix can run. Specifying a remote provider (e.g. openai or togetherai) is the easiest, but you must provide API keys to do that. A helix provider ensures local operation but then you must also add a runner.

Now you’re ready to install the control plane helm chart with the latest images.

export LATEST_RELEASE=$(curl -s https://get.helix.ml/latest.txt)
helm upgrade --install my-helix-controlplane helix/helix-controlplane \
  -f values-example.yaml \
  --set image.tag="${LATEST_RELEASE}"

Ensure all the pods start. If they do not inspect the logs.

Once they are all running, access the control plane via port-forwarding (default) or according to your configuration.

You can configure the Kubernetes deployment by overriding the settings in the values.yaml.

4. Deploying a Runner

Here is some useful information when you configure the runner:

  • the default GPU type in the terraform.tfvars is an L4 with 24GB GPU ram. So --set runner.memory=24GB.
  • by default there’s a single node with a GPU. So install everything on the same node (no selector) and --set replicaCount=1

For example:

export LATEST_RELEASE=$(curl -s https://get.helix.ml/latest.txt)
helm upgrade --install my-helix-runner helix/helix-runner \
  --set runner.host="http://my-helix-controlplane" \
  --set runner.token="oh-hallo-insecure-token" \
  --set runner.memory=24GB \
  --set replicaCount=1 \
  --set runner.axolotl="false" \
  --set image.tag="${LATEST_RELEASE}-small"

If you want to schedule the runner to run on certain nodes, then please set the nodeSelector. Change the label to match the value shown in the output of kubectl describe node.

  --set nodeSelector."nvidia\.com/gpu\.product"="NVIDIA-A100-SXM4-40GB"

Access Helix

The default kubernetes installation is locked down. You can access Helix via port-forwarding from your machine.

kubectl port-forward svc/my-helix-controlplane 8080:80

And visit: http://localhost:8080/

Take a look at the user documentation to learn how to use Helix.

Delete the Cluster

terraform destroy
Last updated on