Editions

Open source now. Enterprise-ready ahead.

The core TUI and Professional edition are available now. Enterprise capabilities roll out via design partner tracks for regulated environments.

Community
Available now
Free
100% open source · Apache 2.0 license
Developers · hobbyists · AI builders

The full Rust TUI, today.

Everything needed to launch, monitor, and tune local llama.cpp workloads on real hardware. 100% open source with a business-friendly Apache 2.0 license. Free forever.

  • Auto-detecting hardware configuration
  • 5-tab config screen (34+ editable fields)
  • TOML-based model definitions
  • Interactive model selection with VRAM estimates
  • Real-time CPU, GPU, and VRAM monitoring (NVML)
  • Raw process log streaming with color-coded levels
  • Basic model/session stats
  • Session cost + energy tracking
  • HuggingFace model browser + GGUF downloads
  • Leaderboard access + benchmark submission
  • Linux full · macOS CPU · Windows partial
Personal
Available now
$8/month
$96 total per year
Solo developers · indie makers · homelab users

Your personal inference lab.

Everything a solo builder needs to tune, track, and compare models — parameter guidance, quant advisor, preset profiles, and a growing personal model library.

  • Everything in Community
  • Vulcanized log view with raw fallback toggle
  • Export raw and vulcanized logs to file
  • Parameter explanations in model manager
  • Smart quantization advisor (VRAM ↔ perplexity)
  • Launch presets / profiles — save named configs, switch with a keypress
  • Model favorites and custom tags
  • Per-model performance history (token/s sparklines)
  • Side-by-side model comparison mode
  • Model notes (attached to config, visible in picker)
  • Notify when model finishes loading
  • CSV export of session stats
  • Session snapshots and launch history timeline
  • Email support (best effort)
Professional
Available now
$50/month
$600 total per year
Teams of 5-50 · AI startups · platform teams

Operators and platform teams.

Real-time cloud price arbitrage, persistent session analytics, and operator leaderboard lanes — for teams of 5-50 that need to prove ROI, not just watch GPU temperatures.

  • Everything in Personal
  • Real-time cloud price arbitrage (50+ providers)
  • Persistent session history with SQLite analytics
  • Hardware Projection Engine — estimate cluster performance
  • Multi-GPU monitoring dashboard
  • LlamaMon structured event export (JSON/NDJSON) for Datadog and pipelines
  • Leaderboard-based recommended parameter values
  • Advanced model-edit stats (historical + comparative)
  • Session comparison & A/B efficiency testing
  • Operator + Procurement leaderboard lanes
  • vLLM backend support
Enterprise
Design Partner
Contact us
per node · custom pricing
CFOs · CISOs · compliance officers

Legal, finance, and compliance.

Full TCO engine, ESG carbon audit PDFs, security sentry, and governance tools — for regulated environments where 'local is cheaper' must be certified, not assumed.

  • Everything in Professional
  • Full Hardware Projection Engine (HPE)
  • TCO engine — CAPEX amortization + OPEX + overhead
  • ESG carbon audit PDF (ISO 14064, EU CSRD compliant)
  • Security Sentry — network exfiltration monitoring
  • Audit-ready logs (SHA-256 chain, tamper-evident)
  • Data sovereignty certification
  • Model license risk classification (VEMR registry)
  • Predictive maintenance alerts
  • Multi-node clustering + auto-scaling
  • Central reporting server (Design Partner) — aggregate data from all nodes
  • Executive dashboards and cost reports (Design Partner)
  • Compliance reporting and audit trails (Design Partner)
  • Utility and grid tariff API overlays (Enterprise)
  • Multi-tenant token accounting (department chargeback)
  • Air-gap deployment support
  • SOC2 / GDPR ready
Personal log workflow

Raw logs in Community. Interpreted logs in Personal+.

Community keeps the original llama.cpp stream visible in the terminal. Personal introduces a vulcanized operator view with raw fallback and dual-mode export (raw + vulcanized). Professional turns raw llama.cpp output into structured LlamaMon events (JSON/NDJSON) for machine processing and Datadog-style ingestion. The examples below are aligned with real parser transforms in the current core vulcanizer.

Before
Community default · raw log tail
Exact source output
llamamon UI · Logs panel (raw)_
tty0
llama.cpp 2620 built with cc
HTTP server listening on 127.0.0.1:8080
all slots are idle, waiting for incoming requests
........................................
loading model from /models/Qwen3-Coder-Next-GGUF.gguf
n_ctx_train = 131072
n_layer = 36
arch = llama
offloaded 35/35 layers to GPU
After
Personal+ default · vulcanized view
Operator-readable
llamamon UI · Logs panel (vulcanized)_
tty0
🚀 LLaMA.cpp v2620 loaded
🌐 Server listening on 127.0.0.1:8080
⏸️ All slots idle - waiting for requests
📥 Loading model ⠙
📦 Loading model...
📖 Context size: 131k
🧱 Layers: 36
🏗️ Architecture: llama
⚡ GPU: 35 offloaded 35/35 layers
Timing-focused vulcanization

Performance timing becomes immediately readable.

These examples show the timing and progress parsing path in action: prompt eval, eval, total timing, prompt-processing progress normalization, and stop-processing token extraction.

Before
Raw timing/progress lines
Low-level metrics
llamamon UI · Timing view (raw)_
tty0
print_timing: prompt eval time = 411.23 ms / 22 tokens
print_timing: eval time = 845.50 ms / 128 tokens
print_timing: total time = 1256.73 ms / 150 tokens
prompt processing progress = 0.625
stop processing: n_tokens = 150
After
Vulcanized timing/progress view
Actionable timing
llamamon UI · Timing view (vulcanized)_
tty0
⏱️ Prompt eval: 411.23 ms / 22 tokens
⏱️ Eval: 845.50 ms / 128 tokens
⏱️ Total: 1256.73 ms / 150 tokens
📝 Prompt processing: 62.5%
⏹️ Stop processing: 150 tokens
The math

Local LLMs pay for themselves in months.

Teams replacing $500k/yr cloud API bills with local inference see massive ROI. Personal and Professional tiers unlock progressively deeper tooling to prove and optimize these savings.

Typical enterprise cloud API spend$500,000 / year
Local inference hardware + software$50,000 / year
Annual cost savings$450,000 / year
ROI timeframe1.3 months

Based on typical enterprise cloud API spend. Local inference provides immediate cost savings while maintaining data privacy and performance.

How we compare
Capability
llamamon
Ollama
nvidia-smi
Datadog
LLM-specific monitoring
Financial intelligence (ROI, TCO)
Pro
Cloud price arbitrage
Pro
ESG carbon reporting
Ent
Security sentry
Ent
5-min setup
SSH-native TUI
Binary size
12 MB
1.2 GB+
system
agent
Will Community always be free?

Yes. llamamon's core TUI is open source under Apache 2.0. Hardware detection, config, monitoring, and launch modes stay free forever — no paywalls on the monitoring surface.

Is there a plan for solo developers?

Yes. The Personal tier is designed for solo builders who want interpreted logs, raw/vulcanized export modes, and lightweight history without jumping straight to the full Professional operator stack.

Which log export formats are available?

Personal exports raw and vulcanized logs for human debugging. Professional adds LlamaMon structured event export (JSON/NDJSON), transforming raw llama.cpp output into cleaner machine-readable events for log pipelines and observability tools like Datadog.

What makes Professional different?

Professional adds cloud arbitrage, persistent analytics, advanced model-edit stats with CSV export, benchmark workflows, and operator/procurement leaderboard lanes — the stack teams use to optimize spend and throughput at scale.

What makes Enterprise different?

ESG carbon reporting, security sentry, audit-ready logs, and procurement support — the compliance and governance layers that legal, finance, and CISOs need. Advanced reporting and utility tariff overlays are available through enterprise design partner engagements.

When do Pro and Enterprise ship?

Professional is available now. Enterprise is delivered through design partner engagements and expands based on customer demand. Reach out to discuss fit and rollout timing.