Open source now. Enterprise-ready ahead.
The core TUI and Professional edition are available now. Enterprise capabilities roll out via design partner tracks for regulated environments.
The full Rust TUI, today.
Everything needed to launch, monitor, and tune local llama.cpp workloads on real hardware. 100% open source with a business-friendly Apache 2.0 license. Free forever.
- Auto-detecting hardware configuration
- 5-tab config screen (34+ editable fields)
- TOML-based model definitions
- Interactive model selection with VRAM estimates
- Real-time CPU, GPU, and VRAM monitoring (NVML)
- Raw process log streaming with color-coded levels
- Basic model/session stats
- Session cost + energy tracking
- HuggingFace model browser + GGUF downloads
- Leaderboard access + benchmark submission
- Linux full · macOS CPU · Windows partial
Your personal inference lab.
Everything a solo builder needs to tune, track, and compare models — parameter guidance, quant advisor, preset profiles, and a growing personal model library.
- Everything in Community
- Vulcanized log view with raw fallback toggle
- Export raw and vulcanized logs to file
- Parameter explanations in model manager
- Smart quantization advisor (VRAM ↔ perplexity)
- Launch presets / profiles — save named configs, switch with a keypress
- Model favorites and custom tags
- Per-model performance history (token/s sparklines)
- Side-by-side model comparison mode
- Model notes (attached to config, visible in picker)
- Notify when model finishes loading
- CSV export of session stats
- Session snapshots and launch history timeline
- Email support (best effort)
Operators and platform teams.
Real-time cloud price arbitrage, persistent session analytics, and operator leaderboard lanes — for teams of 5-50 that need to prove ROI, not just watch GPU temperatures.
- Everything in Personal
- Real-time cloud price arbitrage (50+ providers)
- Persistent session history with SQLite analytics
- Hardware Projection Engine — estimate cluster performance
- Multi-GPU monitoring dashboard
- LlamaMon structured event export (JSON/NDJSON) for Datadog and pipelines
- Leaderboard-based recommended parameter values
- Advanced model-edit stats (historical + comparative)
- Session comparison & A/B efficiency testing
- Operator + Procurement leaderboard lanes
- vLLM backend support
Legal, finance, and compliance.
Full TCO engine, ESG carbon audit PDFs, security sentry, and governance tools — for regulated environments where 'local is cheaper' must be certified, not assumed.
- Everything in Professional
- Full Hardware Projection Engine (HPE)
- TCO engine — CAPEX amortization + OPEX + overhead
- ESG carbon audit PDF (ISO 14064, EU CSRD compliant)
- Security Sentry — network exfiltration monitoring
- Audit-ready logs (SHA-256 chain, tamper-evident)
- Data sovereignty certification
- Model license risk classification (VEMR registry)
- Predictive maintenance alerts
- Multi-node clustering + auto-scaling
- Central reporting server (Design Partner) — aggregate data from all nodes
- Executive dashboards and cost reports (Design Partner)
- Compliance reporting and audit trails (Design Partner)
- Utility and grid tariff API overlays (Enterprise)
- Multi-tenant token accounting (department chargeback)
- Air-gap deployment support
- SOC2 / GDPR ready
Raw logs in Community. Interpreted logs in Personal+.
Community keeps the original llama.cpp stream visible in the terminal. Personal introduces a vulcanized operator view with raw fallback and dual-mode export (raw + vulcanized). Professional turns raw llama.cpp output into structured LlamaMon events (JSON/NDJSON) for machine processing and Datadog-style ingestion. The examples below are aligned with real parser transforms in the current core vulcanizer.
Performance timing becomes immediately readable.
These examples show the timing and progress parsing path in action: prompt eval, eval, total timing, prompt-processing progress normalization, and stop-processing token extraction.
Local LLMs pay for themselves in months.
Teams replacing $500k/yr cloud API bills with local inference see massive ROI. Personal and Professional tiers unlock progressively deeper tooling to prove and optimize these savings.
Based on typical enterprise cloud API spend. Local inference provides immediate cost savings while maintaining data privacy and performance.
Yes. llamamon's core TUI is open source under Apache 2.0. Hardware detection, config, monitoring, and launch modes stay free forever — no paywalls on the monitoring surface.
Yes. The Personal tier is designed for solo builders who want interpreted logs, raw/vulcanized export modes, and lightweight history without jumping straight to the full Professional operator stack.
Personal exports raw and vulcanized logs for human debugging. Professional adds LlamaMon structured event export (JSON/NDJSON), transforming raw llama.cpp output into cleaner machine-readable events for log pipelines and observability tools like Datadog.
Professional adds cloud arbitrage, persistent analytics, advanced model-edit stats with CSV export, benchmark workflows, and operator/procurement leaderboard lanes — the stack teams use to optimize spend and throughput at scale.
ESG carbon reporting, security sentry, audit-ready logs, and procurement support — the compliance and governance layers that legal, finance, and CISOs need. Advanced reporting and utility tariff overlays are available through enterprise design partner engagements.
Professional is available now. Enterprise is delivered through design partner engagements and expands based on customer demand. Reach out to discuss fit and rollout timing.