Product overview
by Inoyu

A terminal workflow built for real inference.

llamamon is a production-grade Rust TUI for running and monitoring local llama.cpp workloads. Not a wrapper script — a full operational surface with hardware awareness, cost intelligence, and compliance metrics.

Features

What's in the TUI today — and what's coming

5-tab configuration screen

Hardware, API, Paths, UI, and Billing in one place — with validation, first-run guidance, and hardware-detected defaults.

Model-driven launching

TOML model definitions with estimated VRAM. Select and launch without reconstructing flags by hand or memorizing options.

Runtime observability

Process output, token count, GPU/CPU utilization, temperatures, and session stats — all in one terminal surface. Community keeps the raw log view; Personal and above add a clearer vulcanized view and raw/vulcanized log export.

Cost and energy tracking

Electricity rate config, session cost calculation, and energy consumption per model run — live, in the TUI.

NVIDIA GPU monitoring

Real NVML integration on Linux — temperature, utilization, and VRAM with per-second refresh. No guessing.

HuggingFace model browser

Search, filter, and download GGUF model variants directly from within the TUI — no browser needed.

Financial intelligence (Pro)

Cloud price arbitrage, persistent session analytics, hardware projection, ROI calculations, and LlamaMon structured event export (JSON/NDJSON) derived from raw llama.cpp logs — in the Professional edition.

Compliance overlays (Enterprise)

ESG carbon audit PDFs, security sentry, audit-ready logs, and data sovereignty certification — in the Enterprise edition.

Screenshots

See the real TUI

Every screen you'll encounter — from model selection to live monitoring.

Model selection
TOML-defined models with per-context VRAM estimates.
Parameter editing
Edit server parameters and quantization options directly in the TUI.
Flags editing
Toggle and edit llama-server flags per model without touching the command line.
Runtime monitoring
Live GPU utilization, temperature, token throughput, cost, and energy.
Launch confirmation
Review flags before spawning llama-server — no surprises.
HF model search
Browse and filter HuggingFace models from within the TUI.
HF model details
Full model metadata, quantization variants, and GGUF downloads.
Configuration

34+ fields across 5 tabs

Every setting lives in one screen — hardware-detected defaults get you started, every field is editable.

01
Hardware
RAM, GPU, VRAM, CPU cores, quantization preferences, context size
02
API
HuggingFace token, llama-server API token, endpoint configuration
03
Paths
llama-server binary, HF cache directory, model TOML directory
04
UI
Refresh rate, GPU temp display, memory display units, theming
05
Billing
Electricity rate, currency symbol, cost display, amortization
Launch modes

Two ways to run

Interactive TUI for exploration, direct model launch for a faster path.

Interactive TUI
./llamamon

Full model selection and monitoring surface. Best for active exploration and daily operations.

Direct model launch
./llamamon glm-4.7

Open the TUI pre-loaded on a specific model. Fast path for hardware-specific handoffs.

Platforms

Cross-platform support

Full support
Linux
NVIDIA GPU via NVML · CPU · memory
CPU mode
macOS
Shared memory with system · no GPU access
Partial
Windows
WMI placeholder · GPU support planned
Language
Rust 2021
TUI framework
Ratatui 0.30
Async runtime
Tokio 1.40
GPU monitoring
NVML (Linux)
Config format
TOML + Serde
Passing tests
626
Binary size
~12 MB
Build time
~6 seconds