Deploy AI Agents on Your Own Infrastructure

HelixML is an enterprise-grade platform for coding agents and AI assistants. Run fleets of background coding agents on real GPU-accelerated desktops. Built-in RAG, API integration, MCP servers, and multi-provider LLM support. Complete data security and control.

App screenshot
Coding Agents

Run fleets of background coding agents on real GPU-accelerated Linux desktops. Spec-first development with design review before implementation. Kanban board to orchestrate dozens of agents. Stream desktops, pair program, and intervene when needed.

Knowledge & RAG

Built-in document ingestion (PDFs, Word, text files), web scraping, and multiple RAG backends: Typesense, Haystack, PGVector, LlamaIndex. Upload corporate documents or point at website URLs to create instant customer support agents.

Multi-Provider LLMs

Support for OpenAI, Anthropic Claude, and local open-weight models. Our GPU scheduler efficiently packs models into available memory and dynamically loads/unloads based on demand. Full tracing and observability for all LLM interactions.

Get started quickly

Install with our quickstart script:

curl -sL -O https://get.helixml.tech/install.sh
chmod +x install.sh
sudo ./install.sh

Then attach GPU runners or use any OpenAI-compatible LLM. For Kubernetes, see the private deployment docs.

Docs ⇨

OpenAI-Compatible API

Helix exposes an OpenAI-compatible chat completions API:

curl https://your-helix-server/v1/chat/completions \
  -H 'Authorization: Bearer <YOUR_API_KEY>' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "llama3:instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Get your API key from the Account page in the app.

Try our hosted Helix Cloud today to see what it can do, then come back here to learn how to deploy it yourself.

Full API reference