Product overview

by Inoyu

A terminal workflow built for real inference.

llamamon is a production-grade Rust TUI for running and monitoring local llama.cpp workloads. Not a wrapper script — a full operational surface with hardware awareness, cost intelligence, and compliance metrics.

Download Read the docs

Features

What's in the TUI today — and what's coming

5-tab configuration screen

Hardware, API, Paths, UI, and Billing in one place — with validation, first-run guidance, and hardware-detected defaults.

Model-driven launching

TOML model definitions with estimated VRAM. Select and launch without reconstructing flags by hand or memorizing options.

Runtime observability

Process output, token count, GPU/CPU utilization, temperatures, and session stats — all in one terminal surface. Community keeps the raw log view; Personal and above add a clearer vulcanized view and raw/vulcanized log export.

Cost and energy tracking

Electricity rate config, session cost calculation, and energy consumption per model run — live, in the TUI.

NVIDIA GPU monitoring

Real NVML integration on Linux — temperature, utilization, and VRAM with per-second refresh. No guessing.

HuggingFace model browser

Search, filter, and download GGUF model variants directly from within the TUI — no browser needed.

Financial intelligence (Pro)

Cloud price arbitrage, persistent session analytics, hardware projection, ROI calculations, and LlamaMon structured event export (JSON/NDJSON) derived from raw llama.cpp logs — in the Professional edition.

Compliance overlays (Enterprise)

ESG carbon audit PDFs, security sentry, audit-ready logs, and data sovereignty certification — in the Enterprise edition.

Screenshots

See the real TUI

Every screen you'll encounter — from model selection to live monitoring.

Model selection

TOML-defined models with per-context VRAM estimates.

Parameter editing

Edit server parameters and quantization options directly in the TUI.

Flags editing

Toggle and edit llama-server flags per model without touching the command line.

Runtime monitoring

Live GPU utilization, temperature, token throughput, cost, and energy.

Launch confirmation

Review flags before spawning llama-server — no surprises.

HF model search

Browse and filter HuggingFace models from within the TUI.

HF model details

Full model metadata, quantization variants, and GGUF downloads.

Configuration

34+ fields across 5 tabs

Every setting lives in one screen — hardware-detected defaults get you started, every field is editable.

Hardware

RAM, GPU, VRAM, CPU cores, quantization preferences, context size

API

HuggingFace token, llama-server API token, endpoint configuration

Paths

llama-server binary, HF cache directory, model TOML directory

Refresh rate, GPU temp display, memory display units, theming

Billing

Electricity rate, currency symbol, cost display, amortization

Launch modes

Two ways to run

Interactive TUI for exploration, direct model launch for a faster path.

Interactive TUI

./llamamon

Full model selection and monitoring surface. Best for active exploration and daily operations.

Direct model launch

./llamamon glm-4.7

Open the TUI pre-loaded on a specific model. Fast path for hardware-specific handoffs.

Platforms

Cross-platform support

Full support

Linux

NVIDIA GPU via NVML · CPU · memory

CPU mode

macOS

Shared memory with system · no GPU access

Partial

Windows

WMI placeholder · GPU support planned

Language

Rust 2021

TUI framework

Ratatui 0.30

Async runtime

Tokio 1.40

GPU monitoring

NVML (Linux)

Config format

TOML + Serde

Passing tests

626

Binary size

~12 MB

Build time

~6 seconds