A terminal workflow built for real inference.
llamamon is a production-grade Rust TUI for running and monitoring local llama.cpp workloads. Not a wrapper script — a full operational surface with hardware awareness, cost intelligence, and compliance metrics.
What's in the TUI today — and what's coming
5-tab configuration screen
Hardware, API, Paths, UI, and Billing in one place — with validation, first-run guidance, and hardware-detected defaults.
Model-driven launching
TOML model definitions with estimated VRAM. Select and launch without reconstructing flags by hand or memorizing options.
Runtime observability
Process output, token count, GPU/CPU utilization, temperatures, and session stats — all in one terminal surface. Community keeps the raw log view; Personal and above add a clearer vulcanized view and raw/vulcanized log export.
Cost and energy tracking
Electricity rate config, session cost calculation, and energy consumption per model run — live, in the TUI.
NVIDIA GPU monitoring
Real NVML integration on Linux — temperature, utilization, and VRAM with per-second refresh. No guessing.
HuggingFace model browser
Search, filter, and download GGUF model variants directly from within the TUI — no browser needed.
Financial intelligence (Pro)
Cloud price arbitrage, persistent session analytics, hardware projection, ROI calculations, and LlamaMon structured event export (JSON/NDJSON) derived from raw llama.cpp logs — in the Professional edition.
Compliance overlays (Enterprise)
ESG carbon audit PDFs, security sentry, audit-ready logs, and data sovereignty certification — in the Enterprise edition.
See the real TUI
Every screen you'll encounter — from model selection to live monitoring.
34+ fields across 5 tabs
Every setting lives in one screen — hardware-detected defaults get you started, every field is editable.
Two ways to run
Interactive TUI for exploration, direct model launch for a faster path.
Full model selection and monitoring surface. Best for active exploration and daily operations.
Open the TUI pre-loaded on a specific model. Fast path for hardware-specific handoffs.