Deploy AI Agents on Your Own Infrastructure
HelixML is an enterprise-grade platform for coding agents and AI assistants. Run fleets of background coding agents on real GPU-accelerated desktops. Built-in RAG, API integration, MCP servers, and multi-provider LLM support. Complete data security and control.
- Coding Agents
Run fleets of background coding agents on real GPU-accelerated Linux desktops. Spec-first development with design review before implementation. Kanban board to orchestrate dozens of agents. Stream desktops, pair program, and intervene when needed.
- Knowledge & RAG
Built-in document ingestion (PDFs, Word, text files), web scraping, and multiple RAG backends: Typesense, Haystack, PGVector, LlamaIndex. Upload corporate documents or point at website URLs to create instant customer support agents.
- Multi-Provider LLMs
Support for OpenAI, Anthropic Claude, and local open-weight models. Our GPU scheduler efficiently packs models into available memory and dynamically loads/unloads based on demand. Full tracing and observability for all LLM interactions.
Get started quickly
Install with our quickstart script:
curl -sL -O https://get.helixml.tech/install.sh
chmod +x install.sh
sudo ./install.shThen attach GPU runners or use any OpenAI-compatible LLM. For Kubernetes, see the private deployment docs.
OpenAI-Compatible API
Helix exposes an OpenAI-compatible chat completions API:
curl https://your-helix-server/v1/chat/completions \
-H 'Authorization: Bearer <YOUR_API_KEY>' \
-H 'Content-Type: application/json' \
-d '{
"model": "llama3:instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'Get your API key from the Account page in the app.
Try our hosted Helix Cloud today to see what it can do, then come back here to learn how to deploy it yourself.