Editions

Open source now. Enterprise-ready ahead.

The core TUI and Professional edition are available now. Enterprise capabilities roll out via design partner tracks for regulated environments.

Community

Available now

Free

100% open source · Apache 2.0 license

Developers · hobbyists · AI builders

The full Rust TUI, today.

Everything needed to launch, monitor, and tune local llama.cpp workloads on real hardware. 100% open source with a business-friendly Apache 2.0 license. Free forever.

Auto-detecting hardware configuration
5-tab config screen (34+ editable fields)
TOML-based model definitions
Interactive model selection with VRAM estimates
Real-time CPU, GPU, and VRAM monitoring (NVML)
Raw process log streaming with color-coded levels
Basic model/session stats
Session cost + energy tracking
HuggingFace model browser + GGUF downloads
Leaderboard access + benchmark submission
Linux full · macOS CPU · Windows partial

Install now

Personal

Available now

$8/month

$96 total per year

Solo developers · indie makers · homelab users

Your personal inference lab.

Everything a solo builder needs to tune, track, and compare models — parameter guidance, quant advisor, preset profiles, and a growing personal model library.

Everything in Community
Vulcanized log view with raw fallback toggle
Export raw and vulcanized logs to file
Parameter explanations in model manager
Smart quantization advisor (VRAM ↔ perplexity)
Launch presets / profiles — save named configs, switch with a keypress
Model favorites and custom tags
Per-model performance history (token/s sparklines)
Side-by-side model comparison mode
Model notes (attached to config, visible in picker)
Notify when model finishes loading
CSV export of session stats
Session snapshots and launch history timeline
Email support (best effort)

Get Personal

Professional

Available now

$50/month

$600 total per year

Teams of 5-50 · AI startups · platform teams

Operators and platform teams.

Real-time cloud price arbitrage, persistent session analytics, and operator leaderboard lanes — for teams of 5-50 that need to prove ROI, not just watch GPU temperatures.

Everything in Personal
Real-time cloud price arbitrage (50+ providers)
Persistent session history with SQLite analytics
Hardware Projection Engine — estimate cluster performance
Multi-GPU monitoring dashboard
LlamaMon structured event export (JSON/NDJSON) for Datadog and pipelines
Leaderboard-based recommended parameter values
Advanced model-edit stats (historical + comparative)
Session comparison & A/B efficiency testing
Operator + Procurement leaderboard lanes
vLLM backend support

Get Professional

Enterprise

Design Partner

per node · custom pricing

CFOs · CISOs · compliance officers

Legal, finance, and compliance.

Full TCO engine, ESG carbon audit PDFs, security sentry, and governance tools — for regulated environments where 'local is cheaper' must be certified, not assumed.

Everything in Professional
Full Hardware Projection Engine (HPE)
TCO engine — CAPEX amortization + OPEX + overhead
ESG carbon audit PDF (ISO 14064, EU CSRD compliant)
Security Sentry — network exfiltration monitoring
Audit-ready logs (SHA-256 chain, tamper-evident)
Data sovereignty certification
Model license risk classification (VEMR registry)
Predictive maintenance alerts
Multi-node clustering + auto-scaling
Central reporting server (Design Partner) — aggregate data from all nodes
Executive dashboards and cost reports (Design Partner)
Compliance reporting and audit trails (Design Partner)
Utility and grid tariff API overlays (Enterprise)
Multi-tenant token accounting (department chargeback)
Air-gap deployment support
SOC2 / GDPR ready

Talk to us

Personal log workflow

Raw logs in Community. Interpreted logs in Personal+.

Community keeps the original llama.cpp stream visible in the terminal. Personal introduces a vulcanized operator view with raw fallback and dual-mode export (raw + vulcanized). Professional turns raw llama.cpp output into structured LlamaMon events (JSON/NDJSON) for machine processing and Datadog-style ingestion. The examples below are aligned with real parser transforms in the current core vulcanizer.

Before

Community default · raw log tail

Exact source output

llamamon UI · Logs panel (raw)_

tty0

llama.cpp 2620 built with cc

HTTP server listening on 127.0.0.1:8080

all slots are idle, waiting for incoming requests

........................................

loading model from /models/Qwen3-Coder-Next-GGUF.gguf

n_ctx_train = 131072

n_layer = 36

arch = llama

offloaded 35/35 layers to GPU

After

Personal+ default · vulcanized view

Operator-readable

llamamon UI · Logs panel (vulcanized)_

tty0

🚀 LLaMA.cpp v2620 loaded

🌐 Server listening on 127.0.0.1:8080

⏸️ All slots idle - waiting for requests

📥 Loading model ⠙

📦 Loading model...

📖 Context size: 131k

🧱 Layers: 36

🏗️ Architecture: llama

⚡ GPU: 35 offloaded 35/35 layers

Timing-focused vulcanization

Performance timing becomes immediately readable.

These examples show the timing and progress parsing path in action: prompt eval, eval, total timing, prompt-processing progress normalization, and stop-processing token extraction.

Before

Raw timing/progress lines

Low-level metrics

llamamon UI · Timing view (raw)_

tty0

print_timing: prompt eval time = 411.23 ms / 22 tokens

print_timing: eval time = 845.50 ms / 128 tokens

print_timing: total time = 1256.73 ms / 150 tokens

prompt processing progress = 0.625

stop processing: n_tokens = 150

After

Vulcanized timing/progress view

Actionable timing

llamamon UI · Timing view (vulcanized)_

tty0

⏱️ Prompt eval: 411.23 ms / 22 tokens

⏱️ Eval: 845.50 ms / 128 tokens

⏱️ Total: 1256.73 ms / 150 tokens

📝 Prompt processing: 62.5%

⏹️ Stop processing: 150 tokens

The math

Local LLMs pay for themselves in months.

Teams replacing $500k/yr cloud API bills with local inference see massive ROI. Personal and Professional tiers unlock progressively deeper tooling to prove and optimize these savings.

Typical enterprise cloud API spend$500,000 / year

Local inference hardware + software$50,000 / year

Annual cost savings$450,000 / year

ROI timeframe1.3 months

Based on typical enterprise cloud API spend. Local inference provides immediate cost savings while maintaining data privacy and performance.

How we compare

Capability

llamamon

Ollama

nvidia-smi

Datadog

LLM-specific monitoring

✓

—

Financial intelligence (ROI, TCO)

Pro

—

Cloud price arbitrage

Pro

—

ESG carbon reporting

Ent

—

Security sentry

Ent

—

5-min setup

✓

—

SSH-native TUI

✓

—

✓

—

Binary size

12 MB

1.2 GB+

system

agent

Will Community always be free?

Yes. llamamon's core TUI is open source under Apache 2.0. Hardware detection, config, monitoring, and launch modes stay free forever — no paywalls on the monitoring surface.

Is there a plan for solo developers?

Yes. The Personal tier is designed for solo builders who want interpreted logs, raw/vulcanized export modes, and lightweight history without jumping straight to the full Professional operator stack.

Which log export formats are available?

Personal exports raw and vulcanized logs for human debugging. Professional adds LlamaMon structured event export (JSON/NDJSON), transforming raw llama.cpp output into cleaner machine-readable events for log pipelines and observability tools like Datadog.

What makes Professional different?

Professional adds cloud arbitrage, persistent analytics, advanced model-edit stats with CSV export, benchmark workflows, and operator/procurement leaderboard lanes — the stack teams use to optimize spend and throughput at scale.

What makes Enterprise different?

ESG carbon reporting, security sentry, audit-ready logs, and procurement support — the compliance and governance layers that legal, finance, and CISOs need. Advanced reporting and utility tariff overlays are available through enterprise design partner engagements.

When do Pro and Enterprise ship?

Professional is available now. Enterprise is delivered through design partner engagements and expands based on customer demand. Reach out to discuss fit and rollout timing.