Helix Architecture

The Helix architecture is an uncomplicated stack of high quality components.

The code is available at github.com/helixml/helix.

Architecture components

Control Plane

See also: docker-compose.yaml.

Runners

Runners connect to the control plane via API/websocket to provide GPUs running model instances. Since they only make outbound connections they can run behind NAT. Each runner knows how much GPU memory it has and polls the API server for new work to do.

A runner is a “fat” container image which contains both the runner golang service and the python virtualenvs that correspond to the supported models.

It includes in the polling a set of filters which allow it to restrict jobs it accepts to ones which will fit in the amount of GPU memory it could hypothetically free if it were to stop all “stale” model instances.

Model instances are Python processes that connect to the runner’s internal API and fetch the latest job to be run. They then spawn inference or fine tuning code via ollama, cog or axolotl.

Last updated on